[issue6697] Python 3.1 segfaults when invalid UTF-8 characters are passed from command line

Marc-Andre Lemburg Wed, 19 Aug 2009 05:51:58 -0700

Marc-Andre Lemburg <m...@egenix.com> added the comment:

Amaury Forgeot d'Arc wrote:
> 
> Amaury Forgeot d'Arc <amaur...@gmail.com> added the comment:
> 
> The problem is actually wider::
>     >>> getattr(None, "\udc80")
>     Segmentation fault
> An idea would be to change _PyUnicode_AsDefaultEncodedString and allow
> unpaired surrogates (utf8+surrogateescape, as explained in PEP383), but
> I fear the consequences...
>
> The code that fails seems pretty common:
>       PyErr_Format(PyExc_AttributeError,
>                    "'%.50s' object has no attribute '%.400s'",
>                    tp->tp_name, _PyUnicode_AsString(name));
> It would be unfortunate to replace all usages of _PyUnicode_AsString to
> check the return value.


The use of _PyUnicode_AsString() is wrong here. There are several
cases where it can fail, e.g. MemoryErrors, embedded NULs, encoding
errors.

The same is true for _PyUnicode_AsStringAndSize(), which is why
I turned them into Python interpreter private APIs before 3.0
shipped.

If you want a fail-safe stringified version of a Unicode object,
your only choice is to create a new API that does error checking,
properly clears the error and then returns a reference to a constant
string, e.g. "<repr-error>".

----------
nosy: +lemburg
title: Python 3.1 segfaults when invalid UTF-8 characters are passed from 
command line -> Python 3.1 segfaults when invalid UTF-8 characters are        
passed from command line

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue6697>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6697] Python 3.1 segfaults when invalid UTF-8 characters are passed from command line

Reply via email to