Re: Guessing the encoding from a BOM

Chris Angelico Thu, 16 Jan 2014 10:10:30 -0800

On Fri, Jan 17, 2014 at 5:01 AM, Björn Lindqvist <bjou...@gmail.com> wrote:
> 2014/1/16 Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>:
>> def guess_encoding_from_bom(filename, default):
>>     with open(filename, 'rb') as f:
>>         sig = f.read(4)
>>     if sig.startswith((b'\xFE\xFF', b'\xFF\xFE')):
>>         return 'utf_16'
>>     elif sig.startswith((b'\x00\x00\xFE\xFF', b'\xFF\xFE\x00\x00')):
>>         return 'utf_32'
>>     else:
>>         return default
>
> You might want to add the utf8 bom too: '\xEF\xBB\xBF'.


I'd actually rather not. It would tempt people to pollute UTF-8 files
with a BOM, which is not necessary unless you are MS Notepad.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Guessing the encoding from a BOM

Reply via email to