Re: not quite 1252

Serge Orlov Fri, 28 Apr 2006 06:06:56 -0700

Anton Vredegoor wrote:
> Serge Orlov wrote:
>
> > I extracted content.xml from a test file and the header is:
> > <?xml version="1.0" encoding="UTF-8"?>
> >
> > So any xml library should handle it just fine, without you trying to
> > guess the encoding.
>
> Yes my header also says UTF-8. However some kind person send me an
> e-mail stating that since I am getting \x94 and such output when using
> repr (even if str is giving correct output) there could be some problem
> with the XML-file not being completely UTF-8. Or is there some other
> reason I'm getting these \x94 codes? Or maybe this is just as it should
> be and there's no problem at all?


Indeed, just load the file into ElementTree. Extending the example you
posted before:

data = zin.read(x)
import elementtree.ElementTree as ET
doc = ET.fromstring(data)
officetag = "{http://openoffice.org/2000/office}";
body = self.doc.find(".//"+officetag+"body")
for fragment in body.getchildren():
   ... process one fragment of document's body ...

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: not quite 1252

Reply via email to