On Jul 7, 2014, at 7:15 AM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:

> On 7/7/2014 2:28 AM, John Wilcock wrote:
>> Le 05/07/2014 19:08, Philip Prindeville a écrit :
>>> As for encoding a cyrillic small a: there are many ways to do this.
>>> iso-8859-4, utf-8, jp2212, gb2312, win1252, etc. I don’t think this
>>> would be very efficient—there are just too many charsets possible.
>> 
>> Normalising the input message to UTF-8 before body checks would help 
>> somewhat with that. I seem to remember there's been talk of doing this.
>> 
> Yes, or utf-16...  I think that will be necessary to keep SA effective in the 
> modern world sooner than later.


Okay, but… if the message body is non-ASCII and the CTE is 8bit or base64 and 
no explicit charset has been given, how do you know which translation to 
perform?

I get a lot of Han SPAM in GB2312 where the charset is never specified 
(apparently it’s a national default in China, despite the requirements stated 
in RFC-2045 and -2046).

-Philip

Reply via email to