On Tue, 30 Nov 2010 16:57:57 -0800 Dan Stromberg <drsali...@gmail.com> wrote: > >> --- On Tue, 11/30/10, Dan Stromberg <drsali...@gmail.com> wrote: > >> > In Python 3, I'm finding that I have encoding issues with > >> > characters > >> > with their high bit set. Things are fine with strictly > >> > ASCII > >> > filenames. With high-bit-set characters, even if I > >> > change stdin's > >> > encoding with: > >> [...] > > I have the same problem using 3.2alpha4: the word man~ana (6 > characters long) in a filename causes problems (I'm catching the > exception and skipping the file for now) despite using what I believe > is an 8-bit, all 256-bytes-are-characters encoding: iso-8859-1. 'not > sure if you wanted both of us to try this, or Yingjie alone though.
What do sys.stdin.encoding and sys.getfilesystemencoding() return? If they are different, then it's the cause of the problem, since sys.getfilesystemencoding() is used by open() to encode filenames. In this case, the solution is to encode filenames yourself using sys.stdin.encoding, or read them as bytes directly from sys.stdin.buffer (which is the binary non-unicode counterpart of sys.stdin). If they are the same, then I guess you can open an issue, provided you give enough indications for people to reproduce :) Regards Antoine. -- http://mail.python.org/mailman/listinfo/python-list