Lawrence D'Oliveiro wrote: > In message <[EMAIL PROTECTED]>, Marc 'BlackJack' > Rintsch wrote: > > > In <[EMAIL PROTECTED]>, > > Preben Randhol wrote: > > > >> Is there a way to calculate in characters > >> and not in bytes to represent the characters. > > > > Decode the byte string and use `len()` on the unicode string. > > Hmmm, for some reason > > len(u"C\u0327") > > returns 2.
If python ever provide this functionality it would be I guess u"C\u0327".width() == 1. But it's not clear when unicode.org will provide recommended fixed font character width information for *all* characters. I recently stumbled upon Tamil language, where for example u'\u0b95\u0bcd', u'\u0b95\u0bbe', u'\u0b95\u0bca', u'\u0b95\u0bcc' looks like they have width 1,2,3 and 4 columns. To add insult to injury these 4 symbols are all considered *single* letter symbols :) If your email reader is able to show them, here they are in all their glory: க், கா, கொ, கௌ. -- http://mail.python.org/mailman/listinfo/python-list