Peter Robinson schrieb: > Dear list > I am at my wits end on what seemed a very simple task: > I have some greek text, nicely encoded in utf8, going in and out of a > xml database, being passed over and beautifully displayed on the web. > For example: the most common greek word of all 'kai' (or και if your > mailer can see utf8) > So all I want to do is: > step through this string a character at a time, and do something for > each character (actually set a width attribute somewhere else for each > character)
As John already said: UTF-8 ain't unicode. UTF-8 is an encoding similar to ASCII or Latin-1 but different in its inner workings. A single character may be encoded by up to 6 bytes. I highly recommend Joel's article on unicode: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) http://www.joelonsoftware.com/articles/Unicode.html Christian -- http://mail.python.org/mailman/listinfo/python-list