New submission from David Beazley:

The PyUnicode_AsWideCharString() function is described as creating a new buffer 
of type wchar_t allocated by PyMem_Alloc() (which must be freed by the user).   
However, if you use this function, it causes the size of the original string 
object to permanently increase.  For example, suppose you had some extension 
code like this:

static PyObject *py_receive_wchar(PyObject *self, PyObject *args) {
  PyObject *obj;
  wchar_t *s;
  Py_ssize_t len;

  if (!PyArg_ParseTuple(args, "U", &obj)) {
    return NULL;
  }
  if ((s = PyUnicode_AsWideCharString(obj, &len)) == NULL) {
    return NULL;
  }
  /* Do nothing */
  PyMem_Free(s);
  Py_RETURN_NONE;
}

Now, try an experiment (assume that the above extension function is available 
as 'receive_wchar'). 

>>> s = "Hell"*1000
>>> len(s)
4000
>>> import sys
>>> sys.getsizeof(s)
4049
>>> receive_wchar(s)
>>> sys.getsizeof(s)
20053
>>>

It seems that PyUnicode_AsWideCharString() may be filling in the wstr field of 
the associated PyASCIIObject structure from PEP393 (I haven't verified).  Once 
filled, it never seems to be discarded.

Background:  I am trying to figure out how to convert from Unicode to (wchar_t, 
int *) that doesn't cause a permanent increase in the memory footprint of the 
original Unicode object.  Also, I'm trying to stay away from deprecated Unicode 
APIs.

----------
components: Extension Modules, Interpreter Core, Unicode
messages: 173089
nosy: dabeaz, ezio.melotti
priority: normal
severity: normal
status: open
title: PyUnicode_AsWideCharString() increases string size
versions: Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16254>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to