[issue7330] PyUnicode_FromFormat segfault

STINNER Victor Thu, 17 Feb 2011 23:41:48 -0800

STINNER Victor <victor.stin...@haypocalc.com> added the comment:

> > It looks like your patch fixes #10829: you should add tests for that, you 
> > can just reuse the tests of my patch (attached to #10829).
> 
> Sorry, but I think my patch doesn't fix #10829.


Ah ok, so don't add failing tests :-)

> > You should also avoid the creation of a temporary unicode object (it can be 
> > slow if precision is large) using PySequence_GetSlice(). Py_UNICODE_COPY() 
> > does already truncate the string because you can pass an arbitrary length.
> 
> In order to use Py_UNICODE_COPY, I have to create a unicode object with 
> required length first.

No you don't. You can copy a substring of the input string with
Py_UNICODE_COPY: just pass a smaller length.

> > With your patch, "%.200s" truncates the input string to 200 *characters*, 
> > but I think that it should truncate to 200 *bytes*, as printf does.
> 
> Sorry, I don't understand. The result of PyUnicode_FromFormatV() is a unicode 
> object. Then how to truncate to 200 *bytes*?

You can truncate the input char* on the call to PyUnicode_DecodeUTF8:
pass a size smaller than strlen(s).

case 's':
{
    /* UTF-8 */
    const char *s = va_arg(count, const char*);
    PyObject *str = PyUnicode_DecodeUTF8(s, strlen(s), "replace");
    if (!str)
        goto fail;
    n += PyUnicode_GET_SIZE(str);
    /* Remember the str and switch to the next slot */
    *callresult++ = str;
    break;
}

I don't know if we should truncate to a number of bytes, or a number of
characters.

> > I don't like this change because I hate having to compute manually strings 
> > length. It should that it would be easier if you format directly strings 
> > with width and precision at step 3, instead of doing it at step 4: so you 
> > can just read the length of the formatted string, and it avoids having to 
> > handle width/precision in two steps (which may be inconsistent :-/).
> 
> Do you mean combine step 3 and step 4 together? Currently step 3 is just to 
> compute the biggest width value and step 4 is to compute exact width and do 
> the real format work. Only by doing real format we can get the exact width of 
> a string. So I have to compute each width twice in both step 3 and step 4. Is 
> combining the two steps in to one a good idea?

"Do you mean combine step 3 and step 4 together?"

Yes, but I am no more sure that it is the right thing to do.

> > Can you add tests for "%.s"? I would like to know if "%.s" is different 
> > than "%s" :-)
> 
> Err, '%.s' causes unexpected result both with and without my patch. Maybe 
> it's still another bug?

If the fix (always have the same behaviour) is short, it would be nice
to include it in your patch.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue7330>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7330] PyUnicode_FromFormat segfault

Reply via email to