Re: Guessing the encoding from a BOM

Tim Chase Thu, 16 Jan 2014 17:41:15 -0800

On 2014-01-17 11:14, Chris Angelico wrote:
> UTF-8 specifies the byte order
> as part of the protocol, so you don't need to mark it.


You don't need to mark it when writing, but some idiots use it
anyway.  If you're sniffing a file for purposes of reading, you need
to look for it and remove it from the actual data that gets returned
from the file--otherwise, your data can see it as corruption.  I end
up with lots of CSV files from customers who have polluted it with
Notepad or had Excel insert some UTF-8 BOM when exporting.  This
means my first column-name gets the BOM prefixed onto it when the
file is passed to csv.DictReader, grr.

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Guessing the encoding from a BOM

Reply via email to