James Abbatiello <abb...@gmail.com> added the comment: In what case(s) do you propose the output to be encoded in UTF-8? If output is to a terminal and that terminal is set to Latin-1 or cp437 or whatever then outputting UTF-8 in that case will only show garbage characters to the user.
If output is to a file then using the encoding of the input file makes the most sense to me. Assume you have a simple program encoded in Latin-1 that prints out a string with some non-ASCII characters. The patch is printed in UTF-8 encoding and redirected to a file. The patch program has no idea what encodings are used and it will just compare the bytes in the original to the bytes in the patch file. These won't match since the encodings are different and he patch will fail. If the output is to a pipe then I'm not sure what the right thing is. It may be intended for display on the screen with something like `less` or it may not. I don't think there's a good solution for this. So following the above logic the patch attached here does the following: 1) If output is to a terminal (sys.stdout.encoding is set) then use that encoding for output 2) Otherwise if an encoding was determined for the input file, use that encoding for output 3) If all else fails, use 'ascii' encoding. If the input contained non-ASCII characters and no encoding has been determined for the input then this will cause an exception to be raised. I think this can only happen when reading the input file from stdin. Perhaps that case needs to be looked at for how to detect the encoding of stdin. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5093> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com