Cory Benfield added the comment: Honestly, David, everything's a mess on this front. The authoritative document here is RFC 7230 Section 3.2.4 (https://tools.ietf.org/html/rfc7230#section-3.2.4). The last paragraph of that section reads:
Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data. In the case of http.client, actually maps pretty closely to Python 3's bytes object: header field values are basically ASCII + arbitrary opaque bytes. While UTF-8 is not strictly called out as allowed, neither is it called out as forbidden. In this case, I'd say that there's no need to be too pedantic about Latin 1 at this stage in the pipeline. Python 3 is welcome to decode using Latin 1 *after* the header block has been split, because at least then it can be fixed up due to the round-tripping nature of Latin 1. But doing it here seems to just confuse the email parser. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27716> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com