On 11/25/2012 07:48 AM, kobayashi wrote:
Encoding is utf-8.
I use "screen length" means as that; that of ascii character is 1, and that of 
character having double width than ascii character is 2.
It's not bytes, but drawing width.
As you say, it depends font. I'll be considering carefully.


Don't forget also that there are combining characters. To wit:

>>> "\u00e1"
'á'
>>> "\u0061\u0301"
'á'

(U+00e1 is an 'a' with acute accent; U+0061 is an unaccented 'a'; U+0301 is an combining acute accent.)


So far the discussion has been on single Unicode code points which appear as a double-wide glyph (I did not know about those!); depending on how you want to look at it, combining characters result in sequences of Unicode code points which result in a single glyph, or combining characters are zero-width code points.

Evan

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to