On Jul 7, 2014, at 7:15 AM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:
> On 7/7/2014 2:28 AM, John Wilcock wrote: >> Le 05/07/2014 19:08, Philip Prindeville a écrit : >>> As for encoding a cyrillic small a: there are many ways to do this. >>> iso-8859-4, utf-8, jp2212, gb2312, win1252, etc. I don’t think this >>> would be very efficient—there are just too many charsets possible. >> >> Normalising the input message to UTF-8 before body checks would help >> somewhat with that. I seem to remember there's been talk of doing this. >> > Yes, or utf-16... I think that will be necessary to keep SA effective in the > modern world sooner than later. Okay, but… if the message body is non-ASCII and the CTE is 8bit or base64 and no explicit charset has been given, how do you know which translation to perform? I get a lot of Han SPAM in GB2312 where the charset is never specified (apparently it’s a national default in China, despite the requirements stated in RFC-2045 and -2046). -Philip