Lawrence D'Oliveiro wrote:
> In message <[EMAIL PROTECTED]>, Marc 'BlackJack'
> Rintsch wrote:
>
> > In <[EMAIL PROTECTED]>,
> > Preben Randhol wrote:
> >
> >> Is there a way to calculate in characters
> >> and not in bytes to represent the characters.
> >
> > Decode the byte string and use `len()` on the unicode string.
>
> Hmmm, for some reason
>
>     len(u"C\u0327")
>
> returns 2.

If python ever provide this functionality it would be I guess
u"C\u0327".width() == 1. But it's not clear when unicode.org will
provide recommended fixed font character width information for *all*
characters. I recently stumbled upon Tamil language, where for example
u'\u0b95\u0bcd', u'\u0b95\u0bbe', u'\u0b95\u0bca', u'\u0b95\u0bcc'
looks like they have width 1,2,3 and 4 columns. To add insult to injury
these 4 symbols are all considered *single* letter symbols :) If your
email reader is able to show them, here they are in all their glory:
க், கா, கொ, கௌ.

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to