Re: ElementTree XML parsing problem

Mike Wed, 27 Apr 2011 14:00:52 -0700

On 4/27/2011 12:24 PM, Neil Cerutti wrote:

On 2011-04-27, Mike<Mike@invalid.invalid>  wrote:

I'm using ElementTree to parse an XML file, but it stops at the
second record (id = 002), which contains a non-standard ascii
character, ?. Here's the XML:


<?xml version="1.0"?>
<snapshot time="Mon Apr 25 08:47:23 PDT 2011">
<records>
<record id="001" education="High School" employment="7 yrs" />
<record id="002" education="Universit?t Bremen" employment="3 years" />
<record id="003" education="River College" employment="5 yrs" />
</records>
</snapshot>

The complaint offered up by the parser is

Unexpected error opening simple_fail.xml: not well-formed
(invalid token): line 5, column 40


It seems to be an invalid XML document, as another poster
indicated.

and if I change the line to eliminate the ?, everything is
wonderful. The parser is perfectly happy with this
modification:

<record id="002" education="University Bremen" employment="3
yrs" />

I can't find anything in the ElementTree docs about allowing
additional text characters or coercing strange ascii to
Unicode.


If you're not the one generating that bogus file, then you can
specify the encoding yourself instead by declaring an XMLParser.

   import xml.etree.ElementTree as etree
   with open('file.xml') as xml_file:
     parser = etree.XMLParser(encoding='ISO-8859-1')
     root = etree.parse(xml_file, parser=parser).getroot()

Thanks, Neil. I'm not generating the file, just trying to parse it. Yoursolution is precisely what I was looking for, even if I didn't quite askcorrectly. I appreciate the help!


-- Mike --

--
http://mail.python.org/mailman/listinfo/python-list

Re: ElementTree XML parsing problem

Reply via email to