Re: Convertion of Unicode to ASCII NIGHTMARE

Serge Orlov Thu, 06 Apr 2006 00:50:44 -0700

Roger Binns wrote:
> "Serge Orlov" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]
> > I have an impression that handling/production of byte order marks is
> > pretty clear: they are produced/consumed only by two codecs: utf-16 and
> > utf-8-sig. What is not clear?
>
> Are you talking about the C APIs in Python/SQLite (that is what I
> have been discussing) or the language level?


Both. Documentation for PyUnicode_DecodeUTF16 and PyUnicode_EncodeUTF16
is pretty clear when BOM is produced/removed. The only problem is that
you have to find out host endianess yourself. In python it's
sys.byteorder, in C you use hack like

unsigned long one = 1;
endianess = (*(char *) &one) == 0) ? 1 : -1;

And then pass endianess to PyUnicode_(De/En)codeUTF16. So I still don't
see what is unclear about BOM production/handling.


>
> At the C level, SQLite doesn't accept boms.

It would be surprising if it did. Quote from
<http://www.unicode.org/faq/utf_bom.html>: "Where the data is typed,
such as a field in a database, a BOM is unnecessary"

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Convertion of Unicode to ASCII NIGHTMARE

Reply via email to