STINNER Victor <[email protected]> added the comment:
> > It looks like your patch fixes #10829: you should add tests for that, you
> > can just reuse the tests of my patch (attached to #10829).
>
> Sorry, but I think my patch doesn't fix #10829.
Ah ok, so don't add failing tests :-)
> > You should also avoid the creation of a temporary unicode object (it can be
> > slow if precision is large) using PySequence_GetSlice(). Py_UNICODE_COPY()
> > does already truncate the string because you can pass an arbitrary length.
>
> In order to use Py_UNICODE_COPY, I have to create a unicode object with
> required length first.
No you don't. You can copy a substring of the input string with
Py_UNICODE_COPY: just pass a smaller length.
> > With your patch, "%.200s" truncates the input string to 200 *characters*,
> > but I think that it should truncate to 200 *bytes*, as printf does.
>
> Sorry, I don't understand. The result of PyUnicode_FromFormatV() is a unicode
> object. Then how to truncate to 200 *bytes*?
You can truncate the input char* on the call to PyUnicode_DecodeUTF8:
pass a size smaller than strlen(s).
case 's':
{
/* UTF-8 */
const char *s = va_arg(count, const char*);
PyObject *str = PyUnicode_DecodeUTF8(s, strlen(s), "replace");
if (!str)
goto fail;
n += PyUnicode_GET_SIZE(str);
/* Remember the str and switch to the next slot */
*callresult++ = str;
break;
}
I don't know if we should truncate to a number of bytes, or a number of
characters.
> > I don't like this change because I hate having to compute manually strings
> > length. It should that it would be easier if you format directly strings
> > with width and precision at step 3, instead of doing it at step 4: so you
> > can just read the length of the formatted string, and it avoids having to
> > handle width/precision in two steps (which may be inconsistent :-/).
>
> Do you mean combine step 3 and step 4 together? Currently step 3 is just to
> compute the biggest width value and step 4 is to compute exact width and do
> the real format work. Only by doing real format we can get the exact width of
> a string. So I have to compute each width twice in both step 3 and step 4. Is
> combining the two steps in to one a good idea?
"Do you mean combine step 3 and step 4 together?"
Yes, but I am no more sure that it is the right thing to do.
> > Can you add tests for "%.s"? I would like to know if "%.s" is different
> > than "%s" :-)
>
> Err, '%.s' causes unexpected result both with and without my patch. Maybe
> it's still another bug?
If the fix (always have the same behaviour) is short, it would be nice
to include it in your patch.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue7330>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com