I assume there's no standard library function that wraps codecs.open() to sniff a file's BOM header and open the file with the appropriate encoding?
My reading of the docs leads me to believe that there are 5 types of possible BOM headers with multiple names (synoymns?) for the same BOM encoding type. BOM = '\xff\xfe' BOM_LE = '\xff\xfe' BOM_UTF16 = '\xff\xfe' BOM_UTF16_LE = '\xff\xfe' BOM_BE = '\xfe\xff' BOM32_BE = '\xfe\xff' BOM_UTF16_BE = '\xfe\xff' BOM64_BE = '\x00\x00\xfe\xff' BOM_UTF32_BE = '\x00\x00\xfe\xff' BOM64_LE = '\xff\xfe\x00\x00' BOM_UTF32 = '\xff\xfe\x00\x00' BOM_UTF32_LE = '\xff\xfe\x00\x00' BOM_UTF8 = '\xef\xbb\xbf' Is the process of writing a BOM sniffer readlly as simple as detecting one of these 5 header types and then calling codecs.open() with the appropriate encoding= parameter? Note: I'm only interested in Unicode encodings. I am not interested in any of the non-Unicode encodings supported by the codecs module. Thank you, Malcolm -- http://mail.python.org/mailman/listinfo/python-list