Stefan Behnel, 09.02.2011 09:58:
Rickard Lindberg, 09.02.2011 09:32:
On Tue, Feb 8, 2011 at 5:41 PM, Chris Rebert<c...@rebertia.com> wrote:
Here is a bash script to reproduce my error:

Including the error message and traceback is still helpful, for future
reference.

Thanks for pointing it out.

#!/bin/sh

cat> å.timeline<<EOF
<snip>
EOF

python<<EOF
# encoding: utf-8
from xml.sax import parse
from xml.sax.handler import ContentHandler
parse(u"å.timeline", ContentHandler())
EOF

If I instead do

parse(u"å.timeline".encode("utf-8"), ContentHandler())

the script runs without errors.

Is this a bug or expected behavior?

Bug; open() figures out the filesystem encoding just fine.
Bug tracker to report the issue to: http://bugs.python.org/

Workaround:
parse(open(u"å.timeline", 'r'), ContentHandler())

When I tried your workaround, I still got this error:

Traceback (most recent call last):
File "<stdin>", line 4, in<module>
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/__init__.py",
line 31, in parse
parser.parse(filename_or_stream)
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py",
line 109, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/xmlreader.py",
line 119, in parse
self.prepareParser(source)
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py",
line 121, in prepareParser
self._parser.SetBase(source.getSystemId())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in
position 0: ordinal not in range(128)

The open(..) part works fine, but there still seems to be a problem
inside the
sax parser.

Did you read my reply?

Sorry, it was me who failed to read your question properly.

Unicode file names aren't really working well, especially not in Py2.x. Python 3.2 provides many improvements here.

I assume your file system encoding is UTF-8? What does sys.getfilesystemencoding() give you?

Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to