Ezio Melotti <ezio.melo...@gmail.com> added the comment: AFAIU the macro returns lone surrogates as they are, this means that: 1) if the string contains only surrogate pairs, Py_UNICODE_NEXT will iterate on scalar values[0]; 2) if the string contains only lone surrogates, it will iterate on codepoints[1]; 3) if it contains both it will be half and half (i.e. scalar values if the surrogates are in pair, or falling back on codepoints if they aren't); (for strings without surrogates, iterating on scalar values or codepoints is the same).
Is this semantic correct for all (or at least most of) the places where the macro will be used? Would a stricter version (that rejects lone surrogates and iterates on scalar values only) be useful in addition or in alternative to Py_UNICODE_NEXT? [0]: http://unicode.org/glossary/#unicode_scalar_value [1]: http://unicode.org/glossary/#code_point ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10542> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com