On 4/27/2011 12:33 PM, Hegedüs Ervin wrote:
hello,

I'm using ElementTree to parse an XML file, but it stops at the
second record (id = 002), which contains a non-standard ascii
character, ä. Here's the XML:

<?xml version="1.0"?>
<snapshot time="Mon Apr 25 08:47:23 PDT 2011">
<records>
<record id="001" education="High School" employment="7 yrs" />
<record id="002" education="Universität Bremen" employment="3 years" />
<record id="003" education="River College" employment="5 yrs" />
</records>
</snapshot>

The complaint offered up by the parser is

I've checked this xml with your script, I think your locales
settings are not good.

$ ./parse.py

XML file: test.xml
001 High School
002 Universität Bremen
003 River College

(name of xml file is "test.xml")

So, I started change the codepage mark of xml:

<?xml version="1.0" encoding="UTF-8" ?>  - same result
<?xml version="1.0" encoding="ISO-8859-2" ?>  - same result
<?xml version="1.0" encoding="ISO-8859-1" ?>  - same result

and then:
<?xml version="1.0" encoding="ascii" ?>  - gives same error as you
described.

Try to change XML encoding.


a.

Thanks, Hegedüs and everyone else who responded. That is exactly it - I'm afraid I probably missed it in the docs because I was searching for terms like "unicode" and "coerce." In any event, that solves the problem. Thanks!

-- Mike --


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to