I've got a couple of programs that read filenames from stdin, and then open those files and do things with them. These programs sort of do the *ix xargs thing, without requiring xargs.
In Python 2, these work well. Irrespective of how filenames are encoded, things are opened OK, because it's all just a stream of single byte characters. In Python 3, I'm finding that I have encoding issues with characters with their high bit set. Things are fine with strictly ASCII filenames. With high-bit-set characters, even if I change stdin's encoding with: import io STDIN = io.open(sys.stdin.fileno(), 'r', encoding='ISO-8859-1') ...even with that, when I read a filename from stdin with a single-character Spanish n~, the program cannot open that filename because the n~ is apparently internally converted to two bytes, but remains one byte in the filesystem. I decided to try ISO-8859-1 with Python 3, because I have a Java program that encountered a similar problem until I used en_US.ISO-8859-1 in an environment variable to set the JVM's encoding for stdin. Python 2 shows the n~ as 0xf1 in an os.listdir('.'). Python 3 with an encoding of ISO-8859-1 wants it to be 0xc3 followed by 0xb1. Does anyone know what I need to do to read filenames from stdin with Python 3.1 and subsequently open them, when some of those filenames include characters with their high bit set? TIA! -- http://mail.python.org/mailman/listinfo/python-list