Re: Chardet oddity

Mark Bourne via Python-list Thu, 24 Oct 2024 07:23:49 -0700

Albert-Jan Roskam wrote:

    Today I used chardet.detect in the repl and it returned windows-1252
    (incorrect, because it later resulted in a UnicodeDecodeError). When I ran
    chardet as a script (which uses UniversalLineDetector) this returned
    MacRoman. Isn't charset.detect the correct way? I've used this method many
    times.
    # Interpreter
    >>> contents = open(FILENAME, "rb").read()
    >>> chardet.detect(content)

Is that copy and pasted from the terminal, or retyped with possibletranscription errors? As written, you've assigned the open file handleto `contents`, but passed `content` (with no "s") to `chardet.detect` -so the result would depend on whatever was previously assigned to `content`.

    {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language':
    ''}
    # Terminal
    $ python -m chardet FILENAME
    FILENAME: MacRoman with confidence 0.7167379080370483
    Thanks!
    Albert-Jan


--
Mark.
--
https://mail.python.org/mailman/listinfo/python-list

Re: Chardet oddity

Reply via email to