On 19Nov2016 13:13, Kevin J. McCarthy <ke...@8t8.us> wrote:
On #mutt, andrey_utkin_ reported getting a bounce trying to reply to a
linux-kernel mailing list email. When he replied, vger.kernel.org
bounced it because of raw utf-8 in a header.
He posted a gist at
<https://gist.github.com/andrey-utkin/b204666c34613858a34844283571ce17>
I don't know how long those hang around, but the problem is in the
References header: <201611170549.Q3WTfoMBþngguang...@intel.com>
contains the utf-8 character "þ".
Are any of you familiar with the rules for Mesage-ID and References
headers?
Somewhat. Reviewing RFC 5322 right now to see how dated my knowledge is ...
https://tools.ietf.org/rfcmarkup/5322
Should mutt be rfc-2047 encoding/decoding the references
header?
No. RFC2047 tokens need to be whitespace delimited from the surrounding text.
No whitespace is permitted inside the "<" and ">" markers which enclose a
message-id:
https://tools.ietf.org/rfcmarkup/5322#section-3.6.4
The whitespace padding requirement is discussed in RFC2047 section 5:
https://tools.ietf.org/rfcmarkup?doc=2047#section-5
The RFC5322 message-id syntax prevents using RFC2047.
I think the cited message-id is simply illegal and unfixable. Mutt should
perhaps support it for stitching threads together, but arguably _not_ release
such a thing into the wild in new References: or In-Reply-To: headers.
What about the domain part - should we be idn encoding that
part if $idn_encode is set?
Perhaps, if required. It looks like RFC3490's encoding is legal dot-text for
RFC5322 (based on my reading of the Wikipedia article). The RFC is here:
https://tools.ietf.org/rfcmarkup?doc=3490
and the article I've consulted has the relevant section here:
https://en.wikipedia.org/wiki/Internationalized_domain_name#ToASCII_and_ToUnicode
Cheers,
Cameron Simpson <c...@zip.com.au>