New submission from Serhiy Storchaka: xmlparser.Parse() works with string data only if XML encoding is utf-8 (or ascii). Examples:
>>> import xml.parsers.expat >>> parser = xml.parsers.expat.ParserCreate() >>> content = [] >>> parser.CharacterDataHandler = content.append >>> parser.Parse("<?xml version='1.0' encoding='utf-8'?><tag>\xb5</tag>") 1 >>> content ['µ'] >>> parser = xml.parsers.expat.ParserCreate() >>> content = [] >>> parser.CharacterDataHandler = content.append >>> parser.Parse("<?xml version='1.0' encoding='iso8859'?><tag>\xb5</tag>") 1 >>> content ['µ'] >>> parser = xml.parsers.expat.ParserCreate() >>> content = [] >>> parser.CharacterDataHandler = content.append >>> parser.Parse("<?xml version='1.0' encoding='utf-16'?><tag>\xb5</tag>") Traceback (most recent call last): File "<stdin>", line 1, in <module> xml.parsers.expat.ExpatError: encoding specified in XML declaration is incorrect: line 1, column 30 This affects all other modules which works with XML: xml.sax, xml.dom.minidom, xml.dom.pulldom, xml.etree.ElementTree. Here is a patch which fixes parsing string data with non-UTF-8 XML. ---------- assignee: serhiy.storchaka components: Extension Modules, Unicode, XML files: pyexpat_parse_str.patch keywords: patch messages: 181014 nosy: ezio.melotti, serhiy.storchaka priority: normal severity: normal stage: patch review status: open title: Expat parser parses strings only when XML encoding is UTF-8 type: behavior versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4 Added file: http://bugs.python.org/file28916/pyexpat_parse_str.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17089> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com