On Thu, May 15, 2014 at 11:41 AM, Victor Stinner <victor.stin...@enovance.com> wrote: > Hi, > > The functions safe_decode() and safe_encode() have been ported to Python 3, > and changed more than once. IMO we can still improve these functions to make > them more reliable and easier to use. > > > (1) My first concern is that these functions try to guess user expectation > about encodings. They use "sys.stdin.encoding or sys.getdefaultencoding()" as > the default encoding to decode, but this encoding depends on the locale > encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and on > the Python major version. > > IMO the default encoding should be UTF-8 because most OpenStack components > expect this encoding. > > Or maybe users want to display data to the terminal, and so the locale > encoding should be used? In this case, locale.getpreferredencoding() would be > more reliable than sys.stdin.encoding.
>From what I can see, most uses of the module are in the client programs. If using locale to find a default encoding is the best approach, perhaps we should go ahead and make the change you propose. One place I see safe_decode() used in a questionable way is in heat in heat/engine/parser.py where validation errors are being re-raised as StackValidationFailed (line 376 in my version). It's not clear why the message is processed the way it is, so I would want to understand the history before proposing a change there. > > > (2) My second concern is that safe_encode(bytes, incoming, encoding) > transcodes the bytes string from incoming to encoding if these two encodings > are different. > > When I port code to Python 3, I'm looking for a function to replace this > common pattern: > > if isinstance(data, six.text_type): > data = data.encode(encoding) > > I don't want to modify data encoding if it is already a bytes string. So I > would prefer to have: > > def safe_encode(data, encoding='utf-8'): > if isinstance(data, six.text_type): > data = data.encode(encoding) > return data > > Changing safe_encode() like this will break applications relying on the > "transcode" feature (incoming => encoding). If such usage exists, I suggest to > add a new function (ex: "transcode" ?) with an API fitting this use case. For > example, the incoming encoding would be mandatory. > > Is there really applications using the incoming parameter? The only place I see that parameter used in integrated projects is in the tests for the module. I didn't check the non-integrated projects. Given its symmetry with safe_decode(), I don't really see a problem with the current name. Something like the shortcut you propose is present in safe_encode(), so I'm not sure what benefit a new function brings? Doug > > > So, what do you think about that? > > Victor > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev