On 12/10/13 6:50 PM, Dan Stromberg wrote:

On Tue, Dec 10, 2013 at 1:07 PM, Petite Abeille
<petite.abei...@gmail.com <mailto:petite.abei...@gmail.com>> wrote:


    On Dec 10, 2013, at 6:25 AM, Dan Stromberg <drsali...@gmail.com
    <mailto:drsali...@gmail.com>> wrote:

     > The IMDB flat text file probably came the closest, but it appears
    to have encoding issues; it's apparently nearly windows-1255, but
    not quite.

    It's ISO-8859-1.

Thanks - that reads well from CPython 3.3.

Now the question becomes: Why did chardet tell me it was windows-1255?  :)

It probably told you it was Windows-1252 (I'm assuming the last 5 is a typo).

Windows-1252 is a super-set of ISO-8859-1, so any text that is correct ISO-8859-1 is also correct Windows-1252. In addition, it's not uncommon to find text marked as ISO-8859-1 that in fact has characters that make it Windows-1252.


--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to