Dear list I am at my wits end on what seemed a very simple task: I have some greek text, nicely encoded in utf8, going in and out of a xml database, being passed over and beautifully displayed on the web. For example: the most common greek word of all 'kai' (or και if your mailer can see utf8) So all I want to do is: step through this string a character at a time, and do something for each character (actually set a width attribute somewhere else for each character)
Should be simple, yes? turns out to be near impossible. I tried using a simple index character routine such as ustr[0]..ustr[1]... and this gives rubbish. So I use len() to find out how long my simple greek string is, and of course it is NOT three characters long. A day of intensive searching around the lists tells me that unicode and python is a moving target: so many fixes are suggested for similar problems, none apparently working with mine. Here is the best I can do, so far I convert the utf8 string using ustr = repr(unicode(thisword, 'iso-8859-7')) for kai this gives the following: u'\u039e\u038a\u039e\xb1\u039e\u0389' so now things should be simple, yes? just go through this and identify each character... Not so simple at all. k, kappa: turns out to be TWO \u strings, not one: thus \u039e\u038a similarly, iota is also two \u strings: \u039e\u0389 alpha is a \u string followed by a \x string: \u039e\xb1 looking elsewhere in the record, my particular favourite is the midpoint character: this comes out as \u03b1\x90\xa7 ! and in the middle of all this, there are some non-unicode characters: \u039e\u038fc is o followed by c! well, I don't have many characters to deal this and I could cope with this mess by tedious matching character by character. But surely, there is a better way... help please Peter Robinson: [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list