Nick Coghlan added the comment:

After reviewing the stdlib code as Serhiy suggested and reflecting on the 
matter for a while, I now think it's better to think of this idea in terms of 
formalising the concept of a "WSGI string". That is, data that has been decoded 
as latin-1 not because that's necessarily correct, but because it creates a 
valid str object that doesn't lose any information, doesn't have any surrogate 
escapes in it, yet can still handle arbitrary binary data.

Under that model, and using a dumps/loads inspired naming scheme (since this is 
effectively a serialisation format for the WSGI server/application boundary), 
the appropriate helpers would be:

    def dump_wsgistr(data, encoding, errors='strict'):
        data.encode(encoding, errors).decode('iso-8859-1')

    def load_wsgistr(data, encoding, errors='strict'):
        data.encode('iso-8859-1').decode(encoding, errors)

As Victor says, using surrogateescape by default is not correct. However, some 
of the code in wsgiref.handlers does pass a custom errors setting, so it's 
appropriate to make that configurable.

With this change, there would be several instances in wsgiref.handlers that 
could be changed from the current:

    data.encode(encoding).decode('iso-8859-1')

to:

    dump_wsgistr(data, encoding)

The point is that it isn't "iso-8859-1" that's significant - it's the 
compliance with the data format mandated by the WSGI 1.0.1 specification (which 
just happens to be "latin-1 decoded string").

----------
title: Add wsgiref.util.fix_decoding -> Add wsgiref.util helpers for dealing 
with "WSGI strings"

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22264>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to