On Fri, Aug 6, 2010 at 6:39 PM, dmtr <dchich...@gmail.com> wrote: <snip> >> > Well... 63 bytes per item for very short unicode strings... Is there >> > any way to do better than that? Perhaps some compact unicode objects? >> >> If you think that unicode objects are going to be *smaller* than byte >> strings, I think you're badly informed about the nature of unicode. > > I don't think that that unicode objects are going to be *smaller*! > But AFAIK internally CPython uses UTF-8?
Nope. unicode objects internally use UCS-2 or UCS-4, depending on how CPython was ./configure-d; the former is the default. See PEP 261. Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list