Unicode characters

Paul Johnston Mon, 04 Sep 2006 06:41:45 -0700

Hi
I have a string which I convert into a list then read through it
printing its glyph and numeric representation


#-*- coding: utf-8 -*-

thestring = "abcd"
thelist = list(thestring)

for c in thelist:
     print c,
     print ord(c)

Works fine for latin characters but when I put in a unicode character
a two byte character gives me two characters. For example an arabic
alef returns

*  216
* 167

( the first asterix is the empty set symbol the second a double "s")

Putting in sequential characters i.e. alef, beh, teh mabuta, gives me
sequential listings i.e.
216  167
216  168
216  169 
So it is reading the correct details.


Is there anyway to get the c in the for loop to recognise it is
reading a multiple byte character.
I have followed the info in PEP 0263 and am using Python 2.4.3 Build
12 on a Windows box  within Eclipse 3.2.0 and Python plugins 1.2.2

Cheers Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Unicode characters

Reply via email to