[issue7330] PyUnicode_FromFormat segfault

STINNER Victor Fri, 11 Feb 2011 04:49:54 -0800

STINNER Victor <victor.stin...@haypocalc.com> added the comment:

It looks like your patch fixes #10829: you should add tests for that, you can 
just reuse the tests of my patch (attached to #10829).


---

unicode_format() looks suboptimal.

+    memset(buffer, ' ', width);
+    width_unicode = PyUnicode_FromStringAndSize(buffer, width);

You should avoid this byte string (buffer) and use memset() on the Unicode 
string directly. Something like:

Py_UNICODE *u;
Py_ssize_t i;
width_unicode = PyUnicode_FromUnicode(NULL, width);
u = PyUnicode_AS_UNICODE(width_unicode);
for(i=0; i < width; i++) {
  *u = (Py_UNICODE)' ';
  u++;
}

You should also avoid the creation of a temporary unicode object (it can be 
slow if precision is large) using PySequence_GetSlice(). Py_UNICODE_COPY() does 
already truncate the string because you can pass an arbitrary length.

---

I don't like "unicode_format" function name: it sounds like "str.format()" in 
Python. A suggestion: "unicode_format_align"

---

With your patch, "%.200s" truncates the input string to 200 *characters*, but I 
think that it should truncate to 200 *bytes*, as printf does.

---

-                n += PyUnicode_GET_SIZE(str);
+                n += width > PyUnicode_GET_SIZE(str) ? width : 
PyUnicode_GET_SIZE(str);

I don't like this change because I hate having to compute manually strings 
length. It should that it would be easier if you format directly strings with 
width and precision at step 3, instead of doing it at step 4: so you can just 
read the length of the formatted string, and it avoids having to handle 
width/precision in two steps (which may be inconsistent :-/).

---

Your patch implements %.100s (and %.100U): we might decide what to do with 
#10833 before commiting your patch.

---

In my opinion, the patch is a little bit too big. We may first commit the fix 
on the code parsing the width and precision: fix #10829?

---

Can you add tests for "%.s"? I would like to know if "%.s" is different than 
"%s" :-)

---

-                                 "must be a sequence, not %200s",
+                                 "must be a sequence, not %.200s",

Hum, I think that they are many other places where such fix should be done. 
Nobody noticed this typo before because %.200s nor %200s were implemented 
(#10833).


---

Finally, do you really need to implement %200s, %2.5s and %.100s? I don't know, 
but I would be ok to commit the patch if you fix it for all of my remarks :-)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue7330>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7330] PyUnicode_FromFormat segfault

Reply via email to