Albert-Jan Roskam wrote:
Today I used chardet.detect in the repl and it returned windows-1252 (incorrect, because it later resulted in a UnicodeDecodeError). When I ran chardet as a script (which uses UniversalLineDetector) this returned MacRoman. Isn't charset.detect the correct way? I've used this method many times. # Interpreter >>> contents = open(FILENAME, "rb").read() >>> chardet.detect(content)
Is that copy and pasted from the terminal, or retyped with possible transcription errors? As written, you've assigned the open file handle to `contents`, but passed `content` (with no "s") to `chardet.detect` - so the result would depend on whatever was previously assigned to `content`.
{'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language': ''} # Terminal $ python -m chardet FILENAME FILENAME: MacRoman with confidence 0.7167379080370483 Thanks! Albert-Jan
-- Mark. -- https://mail.python.org/mailman/listinfo/python-list