I'm using xml.sax.parseString to read an XML file. The XML file contains a few words in Russian, and is encoded in UTF-8 using C#. In the example below, MyParser() is my SAX ContentHandler class. My first try was:

f = open('words.xml', 'r')
s = f.read()
xml.sax.parseString(s, MyParser())

This produced the following error:

Traceback (most recent call last):
File "sax5.py", line 87, in ?
xml.sax.parseString(s, MyParser())
File "D:\Python\lib\xml\sax\__init__.py", line 49, in parseString
parser.parse(inpsrc)
File "D:\Python\lib\xml\sax\expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "D:\Python\lib\xml\sax\xmlreader.py", line 125, in parse
self.close()
File "D:\Python\lib\xml\sax\expatreader.py", line 218, in close
self._cont_handler.endDocument()
File "sax5.py", line 81, in endDocument
f.write(header + self.all + footer)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 745-751: ordinal not in range(128)


The XML declaration should be enough to tell the encoding. Anyway, I read some previous posts, and found that the unicode() function may help:

f = open('words.xml', 'r')
s = f.read()
u = unicode(s, "utf-8")
xml.sax.parseString(u, MyParser())

But I just got another error:

Traceback (most recent call last):
File "sax5.py", line 87, in ?
xml.sax.parseString(u, MyParser())
File "D:\Python\lib\xml\sax\__init__.py", line 49, in parseString
parser.parse(inpsrc)
File "D:\Python\lib\xml\sax\expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "D:\Python\lib\xml\sax\xmlreader.py", line 123, in parse
self.feed(buffer)
File "D:\Python\lib\xml\sax\expatreader.py", line 211, in feed
self._err_handler.fatalError(exc)
File "D:\Python\lib\xml\sax\handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:30: encoding specified in XML declaration is incorrect


I see nothing wrong with my XML declaration:

<?xml version="1.0" encoding="utf-8"?>

And the file is indeed in UTF-8 (or I wouldn't be able to open it in IE and FF). I tried removing the BOM, but it didn't help. What more can be wrong?

Gustaf
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to