Albert-Jan Roskam wrote:
    Today I used chardet.detect in the repl and it returned windows-1252
    (incorrect, because it later resulted in a UnicodeDecodeError). When I ran
    chardet as a script (which uses UniversalLineDetector) this returned
    MacRoman. Isn't charset.detect the correct way? I've used this method many
    times.
    # Interpreter
    >>> contents = open(FILENAME, "rb").read()
    >>> chardet.detect(content)

Is that copy and pasted from the terminal, or retyped with possible transcription errors? As written, you've assigned the open file handle to `contents`, but passed `content` (with no "s") to `chardet.detect` - so the result would depend on whatever was previously assigned to `content`.

    {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language':
    ''}
    # Terminal
    $ python -m chardet FILENAME
    FILENAME: MacRoman with confidence 0.7167379080370483
    Thanks!
    Albert-Jan

--
Mark.
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to