willie wrote:
> (beating a dead horse)
>
> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
> So that it's feasible to calculate the number
> of bytes that make up the unicode code points.
>
> # U+270C
> # 11100010 10011100 10001100
> buf = "\xE2\x9C\x8C"
>
> u = buf.decode('UTF-8')
>
> # ... later ...
>
> u.bytes() -> 3
>
> (goes through each code point and calculates
> the number of bytes that make up the character
> according to the encoding)Yup, it's a dead horse. As suggested elsewhere in the thread, the unicode object is not the proper place for this functionality. Also, as suggested, it's not even the desired functionality: what's really wanted is the ability to tell how long the string is going to be in various encodings. That's easy enough to do today - just encode the darn thing and use len(). I don't see any reason to expand the language to support a data base product that goes out of its way to make it difficult for developers. John Roth -- http://mail.python.org/mailman/listinfo/python-list
