Roger Binns wrote: > "Serge Orlov" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > > I have an impression that handling/production of byte order marks is > > pretty clear: they are produced/consumed only by two codecs: utf-16 and > > utf-8-sig. What is not clear? > > Are you talking about the C APIs in Python/SQLite (that is what I > have been discussing) or the language level?
Both. Documentation for PyUnicode_DecodeUTF16 and PyUnicode_EncodeUTF16 is pretty clear when BOM is produced/removed. The only problem is that you have to find out host endianess yourself. In python it's sys.byteorder, in C you use hack like unsigned long one = 1; endianess = (*(char *) &one) == 0) ? 1 : -1; And then pass endianess to PyUnicode_(De/En)codeUTF16. So I still don't see what is unclear about BOM production/handling. > > At the C level, SQLite doesn't accept boms. It would be surprising if it did. Quote from <http://www.unicode.org/faq/utf_bom.html>: "Where the data is typed, such as a field in a database, a BOM is unnecessary" -- http://mail.python.org/mailman/listinfo/python-list