On 2014-01-17 11:14, Chris Angelico wrote: > UTF-8 specifies the byte order > as part of the protocol, so you don't need to mark it.
You don't need to mark it when writing, but some idiots use it anyway. If you're sniffing a file for purposes of reading, you need to look for it and remove it from the actual data that gets returned from the file--otherwise, your data can see it as corruption. I end up with lots of CSV files from customers who have polluted it with Notepad or had Excel insert some UTF-8 BOM when exporting. This means my first column-name gets the BOM prefixed onto it when the file is passed to csv.DictReader, grr. -tkc -- https://mail.python.org/mailman/listinfo/python-list