On 12/10/13 6:50 PM, Dan Stromberg wrote:
On Tue, Dec 10, 2013 at 1:07 PM, Petite Abeille <petite.abei...@gmail.com <mailto:petite.abei...@gmail.com>> wrote: On Dec 10, 2013, at 6:25 AM, Dan Stromberg <drsali...@gmail.com <mailto:drsali...@gmail.com>> wrote: > The IMDB flat text file probably came the closest, but it appears to have encoding issues; it's apparently nearly windows-1255, but not quite. It's ISO-8859-1. Thanks - that reads well from CPython 3.3. Now the question becomes: Why did chardet tell me it was windows-1255? :)
It probably told you it was Windows-1252 (I'm assuming the last 5 is a typo).
Windows-1252 is a super-set of ISO-8859-1, so any text that is correct ISO-8859-1 is also correct Windows-1252. In addition, it's not uncommon to find text marked as ISO-8859-1 that in fact has characters that make it Windows-1252.
-- Ned Batchelder, http://nedbatchelder.com -- https://mail.python.org/mailman/listinfo/python-list