On Fri, Mar 29, 2013 at 12:11 AM, Ian Kelly <ian.g.ke...@gmail.com> wrote: > From the PEP: > > """ > A new function PyUnicode_AsUTF8 is provided to access the UTF-8 > representation. It is thus identical to the existing > _PyUnicode_AsString, which is removed. The function will compute the > utf8 representation when first called. Since this representation will > consume memory until the string object is released, applications > should use the existing PyUnicode_AsUTF8String where possible (which > generates a new string object every time). APIs that implicitly > converts a string to a char* (such as the ParseTuple functions) will > use PyUnicode_AsUTF8 to compute a conversion. > """ > > So the utf8 representation is not populated when the string is > created, but when a utf8 representation is requested, and only when > requested by the API that returns a char*, not by the API that returns > a bytes object.
Since the PEP specifically mentions ParseTuple string conversion, I am thinking that this is probably the motivation for caching it. A string that is passed into a C function (that uses one of the various UTF-8 char* format specifiers) is perhaps likely to be passed into that function again at some point, so the UTF-8 representation is kept around to avoid the need to recompose it at on each call. -- http://mail.python.org/mailman/listinfo/python-list