On Tue, Feb 8, 2011 at 5:41 PM, Chris Rebert <c...@rebertia.com> wrote: >> Here is a bash script to reproduce my error: > > Including the error message and traceback is still helpful, for future > reference.
Thanks for pointing it out. >> #!/bin/sh >> >> cat > å.timeline <<EOF > <snip> >> EOF >> >> python <<EOF >> # encoding: utf-8 >> from xml.sax import parse >> from xml.sax.handler import ContentHandler >> parse(u"å.timeline", ContentHandler()) >> EOF >> >> If I instead do >> >> parse(u"å.timeline".encode("utf-8"), ContentHandler()) >> >> the script runs without errors. >> >> Is this a bug or expected behavior? > > Bug; open() figures out the filesystem encoding just fine. > Bug tracker to report the issue to: http://bugs.python.org/ > > Workaround: > parse(open(u"å.timeline", 'r'), ContentHandler()) When I tried your workaround, I still got this error: Traceback (most recent call last): File "<stdin>", line 4, in <module> File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/__init__.py", line 31, in parse parser.parse(filename_or_stream) File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py", line 109, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/xmlreader.py", line 119, in parse self.prepareParser(source) File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py", line 121, in prepareParser self._parser.SetBase(source.getSystemId()) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 0: ordinal not in range(128) The open(..) part works fine, but there still seems to be a problem inside the sax parser. -- Rickard Lindberg -- http://mail.python.org/mailman/listinfo/python-list