"Derrick 'dman' Hudson" <[EMAIL PROTECTED]> writes:

> I got a piece of korean spam yesterday that SA (2.20) didn't mark at
> all.  It was multipart/alternative with a text/plain and text/html
> segment both koi8-r.  My ok_locales setting is "en".

You can also add "ok_languages en" to your configuration if you are
running 2.30.  The language guessing is slow (adds something like 50% to
total processing time), but filters out a lot of foreign-language spam.

I'm also working on a new version of the KOREAN_UCE_SUBJECT rule that
matches a few additional Subject: headers.  I don't know a drop of
Korean, but certain hexadecimal strings seemed to be very common in
Korean spam and they're almost always enclosed in some sort of
bracketing characters.

Thanks to Galeon (View menu, Encoding option, Korean option, EUC-KR
option) and a Korean-English dictionary on the web, I was also able to
eventually translate parts of those KOREAN_UCE_SUBJECT headers.  Very
interesting stuff.  Here's what came up with:

  http://www.pathname.com/~quinlan/korean-test.html

The updated rule which I'll check into HEAD catches about 25% more
Korean spam.

Dan

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas - 
http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to