Marc-Andre Lemburg <m...@egenix.com> added the comment: Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc <amaur...@gmail.com> added the comment: > >> Could you please check for chars above 0x7f first and then use >> PyUnicode_Decode() instead of the PyUnicode_FromStringAndSize() API > > I concur: PyUnicode_FromStringAndSize() decodes with utf-8 whereas the > expected conversion char->unicode should use the default encoding (ascii). > But why is it necessary to check for chars above 0x7f?
The Python default encoding has to be ASCII compatible, so it's better to use a short-cut for pure-ASCII characters and avoid the complete round-trip via a temporary Unicode object. >> (this API should not have been backported from the Python 3.x >> in Python 2.6, > This function is still useful when the chars come from a C string literal in > the source code (btw there should be something about the encoding used in C > files). But it's not always correctly used even in 3.x, in posixmodule.c for > example. The function is a really just yet another interface to the PyUnicode_DecodeUTF8() API and it's name is misleading in that: Python 2.x uses the default encoding for converting strings without known encoding to Unicode, the docs for the API say that it decodes Latin-1 (!) and the interface makes it looks like a drop-in replacement for PyString_FromStringAndSize() which it isn't for Python 2.x. For Python 3.x, the default encoding is fixed to UTF-8, so the situation is different (though the docs are still wrong), however I don't see the advantage of using a less explicit name over the direct use of PyUnicode_DecodeUTF8(). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue7649> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com