Jason R. Coombs added the comment:

I've tested it and setting PYTHONIOENCODING='utf-8-sig' starts to get there. It 
causes Python to consume the BOM on stdin, but it also causes stdout to print a 
spurious non-printable character in the output:

C:\Users\jaraco> echo foo | ./print-input
foo

There is a non-printable character before foo. I've included it in this 
message. In Powershell, it's rendered with a square before foo:

□foo

Using PowerShell under ConEmu, it appears as a space:

 foo

In cmd.exe, I see this:

C:\Users\jaraco>python -c "print('foo')"
foo


The space before the 'foo' apparently isn't a space at all.

Indeed, the input is being processed as desired, but the output now is not.

C:\Users\jaraco> python -c "print('bar')"
bar

(the non-printable character appears there too)

If I copy that text to the clipboard, I find that character is actually a 
\ufeff (zero-width no-break space, aka byte order mark). So by setting the 
environment variable to use utf-8-sig for input, it simultaneously changes the 
output to also use utf-8-sig. 

So it appears as if setting the environment variable would work for my purposes 
except that I only want to alter the input encoding and not the output encoding.

I think my goal is pretty basic - read text from standard input and write text 
to standard output on the primary shell included with the most popular 
operating system. I contend that goal should be easily achieved and 
straightforward on Python out of the box.

What does everyone think of the proposal that Python should simply default to 
utf-8-sig instead of utf-8 for stdin encoding?

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21927>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to