Marc-Andre Lemburg <m...@egenix.com> added the comment: Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc <amaur...@gmail.com> added the comment: > > The problem is actually wider:: > >>> getattr(None, "\udc80") > Segmentation fault > An idea would be to change _PyUnicode_AsDefaultEncodedString and allow > unpaired surrogates (utf8+surrogateescape, as explained in PEP383), but > I fear the consequences... > > The code that fails seems pretty common: > PyErr_Format(PyExc_AttributeError, > "'%.50s' object has no attribute '%.400s'", > tp->tp_name, _PyUnicode_AsString(name)); > It would be unfortunate to replace all usages of _PyUnicode_AsString to > check the return value.
The use of _PyUnicode_AsString() is wrong here. There are several cases where it can fail, e.g. MemoryErrors, embedded NULs, encoding errors. The same is true for _PyUnicode_AsStringAndSize(), which is why I turned them into Python interpreter private APIs before 3.0 shipped. If you want a fail-safe stringified version of a Unicode object, your only choice is to create a new API that does error checking, properly clears the error and then returns a reference to a constant string, e.g. "<repr-error>". ---------- nosy: +lemburg title: Python 3.1 segfaults when invalid UTF-8 characters are passed from command line -> Python 3.1 segfaults when invalid UTF-8 characters are passed from command line _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue6697> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com