Antoine Pitrou <pit...@free.fr> added the comment: Note that according to RFC 3977, “The character set for all NNTP commands is UTF-8”.
But it also says this about multi-line data blocks: Note that texts using an encoding (such as UTF-16 or UTF-32) that may contain the octets NUL, LF, or CR other than a CRLF pair cannot be reliably conveyed in the above format (that is, they violate the MUST requirement above). However, except when stated otherwise, this specification does not require the content to be UTF-8, and therefore (subject to that same requirement) it MAY include octets above and below 128 mixed arbitrarily. IMO, it should decode/encode by default using utf-8 (with the "surrogateescape" error handler for easy round-tripping with non-compliant servers), except for raw articles (bodies / envelopes) where bytes should be returned. ---------- nosy: +pitrou _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9360> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com