Re: [Python-ideas] changing sys.stdout encoding

MRAB Wed, 06 Jun 2012 09:12:17 -0700

On 06/06/2012 08:09, Rurpy wrote:

On 06/05/2012 05:56 PM, MRAB wrote:

 On 06/06/2012 00:34, Victor Stinner wrote:

 2012/6/5 Rurpy<rurpy-/e1597as9lqavxtiumw...@public.gmane.org>:

  In my first foray into Python3 I've encountered this problem:
  I work in a multi-language environment.  I've written a number
  of tools, mostly command-line, that generate output on stdout.
  Because these tools and their output are used by various people
  in varying environments, the tools all have an --encoding option
  to provide output that meets the needs and preferences of the
  output's ultimate consumers.


 What happens if the specified encoding is different than the encoding
 of the console? Mojibake?

 If the output is used as in the input of another program, does the
 other program use the same encoding?

 In my experience, using an encoding different than the locale encoding
 for input/output (stdout, environment variables, command line
 arguments, etc.) causes various issues. So I'm curious of your use
 cases.

  In converting them to Python3, I found the best (if not very
  pleasant) way to do this in Python3 was to put something like
  this near the top of each tool[*1]:

    import codecs
    sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer)


 In Python 3, you should use io.TextIOWrapper instead of
 codecs.StreamWriter. It's more efficient and has less bugs.

  What I want to be able to put there instead is:

    sys.stdout.set_encoding (opts.encoding)


 I don't think that your use case merit a new method on
 io.TextIOWrapper: replacing sys.stdout does work and should be used
 instead. TextIOWrapper is generic and your use case if specific to
 sys.std* streams.

 It would be surprising to change the encoding of an arbitrary file
 after it is opened. At least, I don't see the use case.

 [snip]

 And if you _do_ want multiple encodings in a file, it's clearer to open
 the file as binary and then explicitly encode to bytes and write _that_
 to the file.


But is it really?

The following is very simple and the level of python
expertise required is minimal.  It (would) works fine
with redirection.  One could substitute any other ordinary
open (for write) text file for sys.stdout.

   [off the top of my head]
   text = 'This is %s text: 世界へ、こんにちは！'
   sys.stdout.set_encoding ('sjis')
   print (text % 'sjis')
   sys.stdout.set_encoding ('euc-jp')
   print (text % 'euc-jp')
   sys.stdout.set_encoding ('iso2022-jp')
   print (text % 'iso2022-jp')

As for your suggestion, how do I reopen sys.stdout in
binary mode?  I don't need to do that often and don't
know off the top of my head.  (And it's too late for
me to look it up.)  And what happens to redirected output
when I close and reopen the stream?  I can open a regular
filename instead.  But remember to make the last two
opens with "a" rather than "w".  And don't forget the
"\n" at the end of the text line.

Could you show me an code example of your suggestion
for comparison?

Disclaimer: As I said before, I am not particularly
advocating for a for a set_encoding() method -- my
primary suggestion is a programatic way to change the
sys.std* encodings prior to first use.  Here I am just
questioning the claim that a set_encoding() method
would not be clearer than existing alternatives.

This example accesses the underlying binary output stream:


# -*- coding: utf-8 -*-

import sys

class Writer:
    def __init__(self, output):
        self.output = output
        self.encoding = output.encoding
    def write(self, string):
        self.output.buffer.write(string.encode(self.encoding))
    def set_encoding(self, encoding):
        self.output.buffer.flush()
        self.encoding = encoding

sys.stdout = Writer(sys.stdout)

initial_encoding = sys.stdout.encoding

text = 'This is %s text: 世界へ、こんにちは！'
sys.stdout.set_encoding('utf-8')
print (text % 'utf-8')
sys.stdout.set_encoding('sjis')
print (text % 'sjis')
sys.stdout.set_encoding('euc-jp')
print (text % 'euc-jp')
sys.stdout.set_encoding('iso2022-jp')
print (text % 'iso2022-jp')

sys.stdout.set_encoding(initial_encoding)
--
http://mail.python.org/mailman/listinfo/python-list

Re: [Python-ideas] changing sys.stdout encoding

Reply via email to