Anton Vredegoor wrote: > I'm trying to import text from an open office document (save as .sxw and > read the data from content.xml inside the sxw-archive using > elementtree and such tools). > > The encoding that gives me the least problems seems to be cp1252, > however it's not completely perfect because there are still characters > in it like \93 or \94. Has anyone handled this before? I'd rather not > reinvent the wheel and start translating strings 'by hand'.
I extracted content.xml from a test file and the header is: <?xml version="1.0" encoding="UTF-8"?> So any xml library should handle it just fine, without you trying to guess the encoding. -- http://mail.python.org/mailman/listinfo/python-list