Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:
On Fri, Nov 19, 2010 at 3:06 PM, STINNER Victor <rep...@bugs.python.org> wrote: > .. Whereas PyUnicode_FromFormatV() converts the format string > (bytes) to unicode (characters). If you would like a comparaison in C, it's > like printf()+mbstowcs() in the same function. > I see. So it is really the else *s++ = *f; that surreptitiously widens the characters. .. > I choosed to use ASCII instead of UTF-8, because an UTF-8 decoder is long (210 > lines) and complex (see PyUnicode_DecodeUTF8Stateful()), whereas ASCII decode > is just: "unicode_char = (Py_UNICODE)byte;" + an if before to check that 0 <= > byte <= 127). I don't think we need 210 lines to replace "*s++ = *f" with proper UTF-8 logic. Even if we do, the code can be shared with PyUnicode_DecodeUTF8 and a UTF-8 iterator may be a welcome addition to Python C API. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9769> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com