I've got a couple of programs that read filenames from stdin, and then
open those files and do things with them.  These programs sort of do
the *ix xargs thing, without requiring xargs.

In Python 2, these work well.  Irrespective of how filenames are
encoded, things are opened OK, because it's all just a stream of
single byte characters.

In Python 3, I'm finding that I have encoding issues with characters
with their high bit set.  Things are fine with strictly ASCII
filenames.  With high-bit-set characters, even if I change stdin's
encoding with:

      import io
      STDIN = io.open(sys.stdin.fileno(), 'r', encoding='ISO-8859-1')

...even with that, when I read a filename from stdin with a
single-character Spanish n~, the program cannot open that filename
because the n~ is apparently internally converted to two bytes, but
remains one byte in the filesystem.  I decided to try ISO-8859-1 with
Python 3, because I have a Java program that encountered a similar
problem until I used en_US.ISO-8859-1 in an environment variable to
set the JVM's encoding for stdin.

Python 2 shows the n~ as 0xf1 in an os.listdir('.').  Python 3 with an
encoding of ISO-8859-1 wants it to be 0xc3 followed by 0xb1.

Does anyone know what I need to do to read filenames from stdin with
Python 3.1 and subsequently open them, when some of those filenames
include characters with their high bit set?

TIA!
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to