STINNER Victor <victor.stin...@haypocalc.com> added the comment: Proof of concept of patch fixing this issue: - parse_syntax_error() reads the text line into a PyUnicodeObject* instead of a "const char**" - create utf8_to_unicode_offset(): convert byte offset to a number of characters. The Python version should be something like:
def utf8_to_unicode_offset(text, byte_offset): utf8 = text.encode("utf-8") utf8 = utf8[:byte_offset] text = str(utf8, "utf-8") return len(text) - reuse adjust_offset() from py3k_adjust_cursor_at_syntax_error_v2.patch, but force the use of wcswidth() because HAVE_WCSWIDTH is not defined by configure - print_error_text() works on unicode characters and not on bytes! The patch should be refactorized: - move adjust_offset(), utf8_to_unicode_offset(), utf8_len() in unicodeobject.c. You might create a new method "width()" for the unicode type. This method can be used to fix center(), ljust() and rjust() unicode methods (see issue #3446). ---------- Added file: http://bugs.python.org/file13354/issue2382.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue2382> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com