Rickard Lindberg, 08.02.2011 16:57:
Hi,
Here is a bash script to reproduce my error:
#!/bin/sh
cat> å.timeline<<EOF
<?xml version="1.0" encoding="utf-8"?>
<timeline>
<version>0.13.0devb38ace0a572b+</version>
<categories>
</categories>
<events>
<event>
<start>2011-02-01 00:00:00</start>
<end>2011-02-03 08:46:00</end>
<text>asdsd</text>
</event>
</events>
<view>
<displayed_period>
<start>2011-01-24 16:38:11</start>
<end>2011-02-23 16:38:11</end>
</displayed_period>
<hidden_categories>
</hidden_categories>
</view>
</timeline>
EOF
python<<EOF
# encoding: utf-8
from xml.sax import parse
from xml.sax.handler import ContentHandler
parse(u"å.timeline", ContentHandler())
EOF
If I instead do
parse(u"å.timeline".encode("utf-8"), ContentHandler())
the script runs without errors.
Is this a bug or expected behavior?
Expected behaviour. You cannot parse XML from unicode strings, especially
not when the XML data explicitly declares itself as being encoded in UTF-8.
Parse from a byte string instead, as you do in your fixed code.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list