Anton Vredegoor wrote: > Serge Orlov wrote: > > > I extracted content.xml from a test file and the header is: > > <?xml version="1.0" encoding="UTF-8"?> > > > > So any xml library should handle it just fine, without you trying to > > guess the encoding. > > Yes my header also says UTF-8. However some kind person send me an > e-mail stating that since I am getting \x94 and such output when using > repr (even if str is giving correct output) there could be some problem > with the XML-file not being completely UTF-8. Or is there some other > reason I'm getting these \x94 codes? Or maybe this is just as it should > be and there's no problem at all?
Indeed, just load the file into ElementTree. Extending the example you posted before: data = zin.read(x) import elementtree.ElementTree as ET doc = ET.fromstring(data) officetag = "{http://openoffice.org/2000/office}" body = self.doc.find(".//"+officetag+"body") for fragment in body.getchildren(): ... process one fragment of document's body ... -- http://mail.python.org/mailman/listinfo/python-list