On Sat, Jun 15, 2002 at 06:14:36PM -0700, Daniel Quinlan wrote: | "Derrick 'dman' Hudson" <[EMAIL PROTECTED]> writes: | | > I got a piece of korean spam yesterday that SA (2.20) didn't mark at | > all. It was multipart/alternative with a text/plain and text/html | > segment both koi8-r. My ok_locales setting is "en". | | You can also add "ok_languages en" to your configuration if you are | running 2.30.
Yours is the first (accepted) message to run through 2.30 :-). (I installed the debian package, then restarted exim seconds before your post arrived). Before starting exim with the new version, I ran all my "low" scoring spam through it to compare the scores. I have the score added to the subject, so comparing the two was really easy (just watch the subject line, in 2 columns), and I have spam >10.0 rejected at SMTP time. Many of these spams will now score much higher than that, and only 3 scored lower than with 2.20. | The language guessing is slow (adds something like 50% to total | processing time), but filters out a lot of foreign-language spam. I might try that. The reformime trick should be relatively lightweight. | I'm also working on a new version of the KOREAN_UCE_SUBJECT rule that | matches a few additional Subject: headers. I don't know a drop of | Korean, but certain hexadecimal strings seemed to be very common in | Korean spam and they're almost always enclosed in some sort of | bracketing characters. | | Thanks to Galeon (View menu, Encoding option, Korean option, EUC-KR | option) Why didn't galeon realize that on its own? (I saw you put the tag in) | and a Korean-English dictionary on the web, I was also able to | eventually translate parts of those KOREAN_UCE_SUBJECT headers. Very | interesting stuff. Here's what came up with: | | http://www.pathname.com/~quinlan/korean-test.html | | The updated rule which I'll check into HEAD catches about 25% more | Korean spam. Cool. Do those newly tagged spams have 8-bit headers or properly RFC2047 encoded headers? -D -- Your mouse has moved. You must restart Windows for your changes to take effect. http://dman.ddts.net/~dman/
msg06438/pgp00000.pgp
Description: PGP signature