Paul Rubin wrote: > Duncan Booth explains why that doesn't work. But I don't see any big > problem with a byte count function that lets you specify an encoding: > > u = buf.decode('UTF-8') > # ... later ... > u.bytes('UTF-8') -> 3 > u.bytes('UCS-4') -> 4 > > That avoids creating a new encoded string in memory, and for some > encodings, avoids having to scan the unicode string to add up the > lengths.
It requires a fairly large change to code and API for a relatively uncommon problem. How often do you need to know how many bytes an encoded Unicode string takes up without needing the encoded string itself? -- http://mail.python.org/mailman/listinfo/python-list