관리-도구
편집 파일: universaldetector.cpython-38.pyc
U ʗRe�3 � @ s� d Z ddlZddlZddlZddlmZ ddlmZmZm Z ddl mZ ddlm Z ddlmZ dd lmZ dd lmZ G dd� d�ZdS ) a Module containing the UniversalDetector detector class, which is the primary class a user of ``chardet`` should use. :author: Mark Pilgrim (initial port to Python) :author: Shy Shalom (original C code) :author: Dan Blanchard (major refactoring for 3.0) :author: Ian Cordasco � N� )�CharSetGroupProber)� InputState�LanguageFilter�ProbingState)�EscCharSetProber)�Latin1Prober)�MBCSGroupProber)�SBCSGroupProber)� UTF1632Proberc @ s� e Zd ZdZdZe�d�Ze�d�Ze�d�Z dddd d ddd d�Z ejfdd�Z edd� �Zedd� �Zedd� �Zdd� Zdd� Zdd� ZdS )�UniversalDetectoraq The ``UniversalDetector`` class underlies the ``chardet.detect`` function and coordinates all of the different charset probers. To get a ``dict`` containing an encoding and its confidence, you can simply run: .. code:: u = UniversalDetector() u.feed(some_bytes) u.close() detected = u.result g�������?s [�-�]s (|~{)s [�-�]zWindows-1252zWindows-1250zWindows-1251zWindows-1256zWindows-1253zWindows-1255zWindows-1254zWindows-1257)z iso-8859-1z iso-8859-2z iso-8859-5z iso-8859-6z iso-8859-7z iso-8859-8z iso-8859-9ziso-8859-13c C sT d | _ d | _g | _d | _d | _d | _d | _d | _|| _t � t�| _d | _ | �� d S �N)�_esc_charset_prober�_utf1632_prober�_charset_probers�result�done� _got_data�_input_state� _last_char�lang_filter�logging� getLogger�__name__�logger�_has_win_bytes�reset)�selfr � r ��/builddir/build/BUILDROOT/alt-python38-pip-22.2.1-2.el8.x86_64/opt/alt/python38/lib/python3.8/site-packages/pip/_vendor/chardet/universaldetector.py�__init__T s zUniversalDetector.__init__c C s | j S r )r �r r r r �input_stateb s zUniversalDetector.input_statec C s | j S r )r r! r r r � has_win_bytesf s zUniversalDetector.has_win_bytesc C s | j S r )r r! r r r �charset_probersj s z!UniversalDetector.charset_probersc C sf dddd�| _ d| _d| _d| _tj| _d| _| jr>| j� � | j rN| j � � | jD ]}|� � qTdS )z� Reset the UniversalDetector and all of its probers back to their initial states. This is called by ``__init__``, so you only need to call this directly in between analyses of different documents. N� ��encoding� confidence�languageF� )r r r r r � PURE_ASCIIr r r r r r )r �proberr r r r n s zUniversalDetector.resetc C s� | j r dS |sdS t|t�s$t|�}| js�|�tj�rFdddd�| _nv|�tjtj f�rhdddd�| _nT|�d�r�dddd�| _n:|�d �r�d ddd�| _n |�tj tjf�r�dddd�| _d| _| jd dk r�d| _ dS | jt jk�r(| j�|�r�t j| _n*| jt jk�r(| j�| j| ��r(t j| _|dd� | _| j�sFt� | _| jjtjk�r�| j�|�tjk�r�| jj| j�� dd�| _d| _ dS | jt jk�r�| j�s�t| j �| _| j�|�tjk�r�| jj| j�� | jj!d�| _d| _ n�| jt jk�r�| j"�s4t#| j �g| _"| j t$j%@ �r&| j"�&t'� � | j"�&t(� � | j"D ]:}|�|�tjk�r:|j|�� |j!d�| _d| _ �qv�q:| j)�|��r�d| _*dS )a� Takes a chunk of a document and feeds it through all of the relevant charset probers. After calling ``feed``, you can check the value of the ``done`` attribute to see if you need to continue feeding the ``UniversalDetector`` more data, or if it has made a prediction (in the ``result`` attribute). .. note:: You should always call ``close`` when you're done feeding in your document if ``done`` is not already ``True``. Nz UTF-8-SIG� �?� r&