[issue16254] PyUnicode_AsWideCharString() increases string size

Martin v . Löwis Tue, 16 Oct 2012 14:35:29 -0700

Martin v. Löwis added the comment:

As stated, this is not a bug: there is no memory leak, nor any deviation from 
documented behavior.


You are right that it fills the wstr pointer, by calling 
PyUnicode_AsUnicodeAndSize in unicode_aswidechar, and then copying the data to 
a fresh buffer.

This is merely the simplest implementation; it's certainly possible to improve 
it. Contributions are welcome.

A number of things need to be considered:
- Computing the wstr size is somewhat expensive if on a 16-bit wchar_t system, 
since the result may need surrogate pairs.
- I would suggest that if possible, the wstr representation should be returned 
out of the unicode object (resetting wstr to NULL). This should produce the 
greatest reuse in code, yet avoid unnecessary copying.
- It's not possible to do so for strings where wstr is shared with the 
canonical representation (i.e. a UCS-2 string on 16-bit wchar_t, and a UCS-4 
string on 32-bit wchar_t).
- I don't think wstr should be cleared if it was already filled when the 
function got called. Instead, wstr should only be returned if it was originally 
NULL.

----------
nosy: +loewis

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue16254>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue16254] PyUnicode_AsWideCharString() increases string size

Reply via email to