>>> ascii strings and unicode strings are perfectly interchangable, with >>> some minor exceptions. >> >> It's not only translate, it's decode too... > > why would you use decode on the strings you get back from ET ?
Long story... some time ago when computers wouldn't support charsets people invented so called "cyrillic fonts" - ie a font that has cyrillic glyphs mapped on the latin posstions. Since our cyrillic alphabet has 31 characters, some characters in said fonts were mapped to { or ~ etc.. Of course this ,,sollution" is awful but it was the only one at the time. So I'm making a python script that takes an OpenDocument file and translates it to UTF-8... ps. I use translate now, but I was making a general note that unicode and string objects are not 100% interchangeable. translate, encode, decode are especially problematic. anyway, I wrap the output of ET in unicode() now... I don't see another, better, sollution. -- http://mail.python.org/mailman/listinfo/python-list