On Apr 12, 3:45 pm, Peter Robinson <[EMAIL PROTECTED]> wrote: > Dear list > I am at my wits end on what seemed a very simple task: > I have some greek text, nicely encoded in utf8, going in and out of a > xml database, being passed over and beautifully displayed on the web. > For example: the most common greek word of all 'kai' (or και if your > mailer can see utf8) > So all I want to do is: > step through this string a character at a time, and do something for > each character (actually set a width attribute somewhere else for each > character) > > Should be simple, yes? > turns out to be near impossible. I tried using a simple index > character routine such as ustr[0]..ustr[1]... and this gives rubbish. > So I use len() to find out how long my simple greek string is, and of > course it is NOT three characters long.
The utf8-encoded incarnation is three characters long and it's six bytes long. utf-8 is not unicode. > > A day of intensive searching around the lists tells me that unicode > and python is a moving target: so many fixes are suggested for similar > problems, none apparently working with mine. > > Here is the best I can do, so far > I convert the utf8 string using > ustr = repr(unicode(thisword, 'iso-8859-7')) Don't do that. If you have a utf8 string, convert it to unicode like this: ustr = unicode(the_utf8_string, 'utf8') If you have a string encoded in iso-8859-7, convert it to unicode like this: ustr = unicode(the_iso_8859_7_string, 'iso-8859-7') Then inspect it like this: print repr(ustr) Here's a sample interactive session: >>> thisword = '\xce\xba\xce\xb1\xce\xb9' >>> ustr = unicode(thisword, 'utf8') >>> len(ustr) 3 >>> print repr(ustr) u'\u03ba\u03b1\u03b9' >>> import unicodedata >>> [unicodedata.name(x) for x in ustr] ['GREEK SMALL LETTER KAPPA', 'GREEK SMALL LETTER ALPHA', 'GREEK SMALL LETTER IOTA'] Suggested reading: the Python Unicode HOWTO at http://www.amk.ca/python/howto/unicode This may be handy: http://unicode.org/charts/PDF/U0370.pdf HTH, John -- http://mail.python.org/mailman/listinfo/python-list