I write: >> I don't think I've ever received a UTF-8 Korean spam,
dman <[EMAIL PROTECTED]> writes: > That's why someone needs to convert the characters to ks_c_5601-1987 > and euc-kr for SA's tests. Most of the spam is coming through as 8-bit ks_c_5601-1987. That's what the test should look for (and it's what I just checked into the CVS tree) ... because it works. I agree with Matt that SA should eventually decode all QP and base64 text into 8-bit characters, then SA tests can always test 8-bit, but that just means that a bit more spam will match the KOREAN_UCE_SUBJECT test. Whether we should embark on converting between encodings (ks_c_5601-1987 to utf8, etc.) is another question. That is a much more complicated problem and probably only useful for language-specific word tests for languages that are sent in more than one encoding. > It would be nice if I could junk other foreign-language messages too > since I can't read them. That's a little harder to detect (ie when > they are iso8859-1 or utf-8). A solution is pending. Take a look at: http://bugzilla.spamassassin.org/show_bug.cgi?id=293 Dan _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk