Am Mittwoch, 10. Januar 2007 23:18 schrieb José Matos: > On Wednesday 10 January 2007 9:33 pm, Georg Baum wrote: > > Ah, now I know the problem: If we add string literals to document.body we > > need to prefix them with u to get unicode string literals: u'bla'. Now I > > know where to search. > > That is enough to drive anyone (read me) crazy. :-)
Me too, but I found a workaround: # Unfortunately we have a mixture of unciode strings and plain strings, # because we never use u'xxx' for string literals, but 'xxx'. # Therefore we may have to try two times to normalize the data. try: document.body[i] = unicodedata.normalize("NFKD", document.body[i]) except TypeError: document.body[i] = unicodedata.normalize("NFKD", unicode(document.body[i], 'utf-8')) That works, now I have to find the next bug :-( Georg