"Derrick 'dman' Hudson" <[EMAIL PROTECTED]> writes: > I got a piece of korean spam yesterday that SA (2.20) didn't mark at > all. It was multipart/alternative with a text/plain and text/html > segment both koi8-r. My ok_locales setting is "en".
You can also add "ok_languages en" to your configuration if you are running 2.30. The language guessing is slow (adds something like 50% to total processing time), but filters out a lot of foreign-language spam. I'm also working on a new version of the KOREAN_UCE_SUBJECT rule that matches a few additional Subject: headers. I don't know a drop of Korean, but certain hexadecimal strings seemed to be very common in Korean spam and they're almost always enclosed in some sort of bracketing characters. Thanks to Galeon (View menu, Encoding option, Korean option, EUC-KR option) and a Korean-English dictionary on the web, I was also able to eventually translate parts of those KOREAN_UCE_SUBJECT headers. Very interesting stuff. Here's what came up with: http://www.pathname.com/~quinlan/korean-test.html The updated rule which I'll check into HEAD catches about 25% more Korean spam. Dan _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk