willie wrote: > (beating a dead horse) > > Is it too ridiculous to suggest that it'd be nice > if the unicode object were to remember the > encoding of the string it was decoded from? > So that it's feasible to calculate the number > of bytes that make up the unicode code points. > > # U+270C > # 11100010 10011100 10001100 > buf = "\xE2\x9C\x8C" > > u = buf.decode('UTF-8') > > # ... later ... > > u.bytes() -> 3 > > (goes through each code point and calculates > the number of bytes that make up the character > according to the encoding)
Yup, it's a dead horse. As suggested elsewhere in the thread, the unicode object is not the proper place for this functionality. Also, as suggested, it's not even the desired functionality: what's really wanted is the ability to tell how long the string is going to be in various encodings. That's easy enough to do today - just encode the darn thing and use len(). I don't see any reason to expand the language to support a data base product that goes out of its way to make it difficult for developers. John Roth -- http://mail.python.org/mailman/listinfo/python-list