[issue9769] PyUnicode_FromFormatV() doesn't handle non-ascii text correctly

Alexander Belopolsky Fri, 19 Nov 2010 12:58:31 -0800

Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:


On Fri, Nov 19, 2010 at 3:06 PM, STINNER Victor <rep...@bugs.python.org> wrote:
> .. Whereas PyUnicode_FromFormatV() converts the format string
> (bytes) to unicode (characters). If you would like a comparaison in C, it's
> like printf()+mbstowcs() in the same function.
>

I see.  So it is really the

        else
            *s++ = *f;

that surreptitiously widens the characters.

..
> I choosed to use ASCII instead of UTF-8, because an UTF-8 decoder is long (210
> lines) and complex (see PyUnicode_DecodeUTF8Stateful()), whereas ASCII decode
> is just: "unicode_char = (Py_UNICODE)byte;" + an if before to check that 0 <=
> byte <= 127).

I don't think we need 210 lines to replace "*s++ = *f" with proper
UTF-8 logic.  Even if we do, the code can be shared with
PyUnicode_DecodeUTF8 and a UTF-8 iterator may be a welcome addition to
Python C API.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9769>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9769] PyUnicode_FromFormatV() doesn't handle non-ascii text correctly

Reply via email to