On Sat, Jun 15, 2002 at 06:14:36PM -0700, Daniel Quinlan wrote:
| "Derrick 'dman' Hudson" <[EMAIL PROTECTED]> writes:
| 
| > I got a piece of korean spam yesterday that SA (2.20) didn't mark at
| > all.  It was multipart/alternative with a text/plain and text/html
| > segment both koi8-r.  My ok_locales setting is "en".
| 
| You can also add "ok_languages en" to your configuration if you are
| running 2.30.

Yours is the first (accepted) message to run through 2.30 :-).  (I
installed the debian package, then restarted exim seconds before your
post arrived).

Before starting exim with the new version, I ran all my "low" scoring
spam through it to compare the scores.  I have the score added to the
subject, so comparing the two was really easy (just watch the subject
line, in 2 columns), and I have spam >10.0 rejected at SMTP time.
Many of these spams will now score much higher than that, and only 3
scored lower than with 2.20.

| The language guessing is slow (adds something like 50% to total
| processing time), but filters out a lot of foreign-language spam.

I might try that.  The reformime trick should be relatively lightweight.
 
| I'm also working on a new version of the KOREAN_UCE_SUBJECT rule that
| matches a few additional Subject: headers.  I don't know a drop of
| Korean, but certain hexadecimal strings seemed to be very common in
| Korean spam and they're almost always enclosed in some sort of
| bracketing characters.
| 
| Thanks to Galeon (View menu, Encoding option, Korean option, EUC-KR
| option)

Why didn't galeon realize that on its own?
(I saw you put the tag in)

| and a Korean-English dictionary on the web, I was also able to
| eventually translate parts of those KOREAN_UCE_SUBJECT headers.  Very
| interesting stuff.  Here's what came up with:
| 
|   http://www.pathname.com/~quinlan/korean-test.html
| 
| The updated rule which I'll check into HEAD catches about 25% more
| Korean spam.

Cool.  Do those newly tagged spams have 8-bit headers or properly
RFC2047 encoded headers?

-D

-- 

                          Your mouse has moved.
       You must restart Windows for your changes to take effect.
 
http://dman.ddts.net/~dman/

Attachment: msg06438/pgp00000.pgp
Description: PGP signature

Reply via email to