On 06/06/2012 08:09, Rurpy wrote:
On 06/05/2012 05:56 PM, MRAB wrote:
On 06/06/2012 00:34, Victor Stinner wrote:
2012/6/5 Rurpy<rurpy-/e1597as9lqavxtiumw...@public.gmane.org>:
In my first foray into Python3 I've encountered this problem:
I work in a multi-language environment. I've written a number
of tools, mostly command-line, that generate output on stdout.
Because these tools and their output are used by various people
in varying environments, the tools all have an --encoding option
to provide output that meets the needs and preferences of the
output's ultimate consumers.
What happens if the specified encoding is different than the encoding
of the console? Mojibake?
If the output is used as in the input of another program, does the
other program use the same encoding?
In my experience, using an encoding different than the locale encoding
for input/output (stdout, environment variables, command line
arguments, etc.) causes various issues. So I'm curious of your use
cases.
In converting them to Python3, I found the best (if not very
pleasant) way to do this in Python3 was to put something like
this near the top of each tool[*1]:
import codecs
sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer)
In Python 3, you should use io.TextIOWrapper instead of
codecs.StreamWriter. It's more efficient and has less bugs.
What I want to be able to put there instead is:
sys.stdout.set_encoding (opts.encoding)
I don't think that your use case merit a new method on
io.TextIOWrapper: replacing sys.stdout does work and should be used
instead. TextIOWrapper is generic and your use case if specific to
sys.std* streams.
It would be surprising to change the encoding of an arbitrary file
after it is opened. At least, I don't see the use case.
[snip]
And if you _do_ want multiple encodings in a file, it's clearer to open
the file as binary and then explicitly encode to bytes and write _that_
to the file.
But is it really?
The following is very simple and the level of python
expertise required is minimal. It (would) works fine
with redirection. One could substitute any other ordinary
open (for write) text file for sys.stdout.
[off the top of my head]
text = 'This is %s text: 世界へ、こんにちは!'
sys.stdout.set_encoding ('sjis')
print (text % 'sjis')
sys.stdout.set_encoding ('euc-jp')
print (text % 'euc-jp')
sys.stdout.set_encoding ('iso2022-jp')
print (text % 'iso2022-jp')
As for your suggestion, how do I reopen sys.stdout in
binary mode? I don't need to do that often and don't
know off the top of my head. (And it's too late for
me to look it up.) And what happens to redirected output
when I close and reopen the stream? I can open a regular
filename instead. But remember to make the last two
opens with "a" rather than "w". And don't forget the
"\n" at the end of the text line.
Could you show me an code example of your suggestion
for comparison?
Disclaimer: As I said before, I am not particularly
advocating for a for a set_encoding() method -- my
primary suggestion is a programatic way to change the
sys.std* encodings prior to first use. Here I am just
questioning the claim that a set_encoding() method
would not be clearer than existing alternatives.
This example accesses the underlying binary output stream:
# -*- coding: utf-8 -*-
import sys
class Writer:
def __init__(self, output):
self.output = output
self.encoding = output.encoding
def write(self, string):
self.output.buffer.write(string.encode(self.encoding))
def set_encoding(self, encoding):
self.output.buffer.flush()
self.encoding = encoding
sys.stdout = Writer(sys.stdout)
initial_encoding = sys.stdout.encoding
text = 'This is %s text: 世界へ、こんにちは!'
sys.stdout.set_encoding('utf-8')
print (text % 'utf-8')
sys.stdout.set_encoding('sjis')
print (text % 'sjis')
sys.stdout.set_encoding('euc-jp')
print (text % 'euc-jp')
sys.stdout.set_encoding('iso2022-jp')
print (text % 'iso2022-jp')
sys.stdout.set_encoding(initial_encoding)
--
http://mail.python.org/mailman/listinfo/python-list