On Wed, 2 Jul 2014 11:37:33 -0700 (PDT)
John Hardin <jhar...@impsec.org> wrote:

> Nope. The content-transfer-encoding is only for the *transfer* part
> of the process. Once the content reaches the MUA that content can be
> further parsed by the MUA according to other encoding rules, such as
> these escape sequences for Unicode characters.

I don't think so.  Any MUA that tried to convert "&#x0435;" to a
Unicode character in a text/plain part with implicit US-ASCII charset
and 7bit content transfer encoding is broken.  An MUA should diplay
exactly "&#x0435;" in this situation.  It's a different story for
text/html parts, of course.

> That's perfectly valid. How else would you send, for example, a
> c-cedille in spanish text via a 7-bit-clean channel?

With the appropriate charset and content-transfer-encoding, such as
ISO-8859-1, quoted-printable, and =E7.

> I would say that's more a case of those characters shouldn't be
> present if the language is en-us than an encoding issue. The presence
> of lots of those is either a sign that the text isn't English, or is
> obfuscated. How do you reliably tell the language of the message?

I would say the presence of &#xABCD; in a text/plain part is either
a bug in spam-generating software or a researcher trying to send
something to a colleague. :)

Regards,

David.

Reply via email to