[issue27716] http.client truncates UTF-8 encoded headers

Cory Benfield Tue, 09 Aug 2016 06:50:38 -0700

Cory Benfield added the comment:

Honestly, David, everything's a mess on this front. The authoritative document 
here is RFC 7230 Section 3.2.4 
(https://tools.ietf.org/html/rfc7230#section-3.2.4). The last paragraph of that 
section reads:


   Historically, HTTP has allowed field content with text in the
   ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
   through use of [RFC2047] encoding.  In practice, most HTTP header
   field values use only a subset of the US-ASCII charset [USASCII].
   Newly defined header fields SHOULD limit their field values to
   US-ASCII octets.  A recipient SHOULD treat other octets in field
   content (obs-text) as opaque data.

In the case of http.client, actually maps pretty closely to Python 3's bytes 
object: header field values are basically ASCII + arbitrary opaque bytes. While 
UTF-8 is not strictly called out as allowed, neither is it called out as 
forbidden.

In this case, I'd say that there's no need to be too pedantic about Latin 1 at 
this stage in the pipeline. Python 3 is welcome to decode using Latin 1 *after* 
the header block has been split, because at least then it can be fixed up due 
to the round-tripping nature of Latin 1. But doing it here seems to just 
confuse the email parser.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27716>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27716] http.client truncates UTF-8 encoded headers

Reply via email to