Rickard Lindberg, 08.02.2011 16:57:
Hi,

Here is a bash script to reproduce my error:

     #!/bin/sh

     cat>  å.timeline<<EOF
     <?xml version="1.0" encoding="utf-8"?>
     <timeline>
       <version>0.13.0devb38ace0a572b+</version>
       <categories>
       </categories>
       <events>
         <event>
           <start>2011-02-01 00:00:00</start>
           <end>2011-02-03 08:46:00</end>
           <text>asdsd</text>
         </event>
       </events>
       <view>
         <displayed_period>
           <start>2011-01-24 16:38:11</start>
           <end>2011-02-23 16:38:11</end>
         </displayed_period>
         <hidden_categories>
         </hidden_categories>
       </view>
     </timeline>
     EOF

     python<<EOF
     # encoding: utf-8
     from xml.sax import parse
     from xml.sax.handler import ContentHandler
     parse(u"å.timeline", ContentHandler())
     EOF

If I instead do

     parse(u"å.timeline".encode("utf-8"), ContentHandler())

the script runs without errors.

Is this a bug or expected behavior?

Expected behaviour. You cannot parse XML from unicode strings, especially not when the XML data explicitly declares itself as being encoded in UTF-8.

Parse from a byte string instead, as you do in your fixed code.

Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to