My test set is 1789 messages composed of about 25% spam.  Only a few
possible tests in this set, but I figure it's good to share some of
the failures too.

1. Subject =~ /^\s*Re:/i
   lacking "In-Reply-To" or "References" header

   (idea for test from
   http://linuxconf.unixtech.be/configurations/mutt/mutt.color.index.html)

   RESULT: 60 matches, only 15% were spam (terrible no rule)

   What if we exempt the two most prevalent guilty mailers ("Internet
   Mail Service" and "Lotus Notes")?

   RESULT: 25 matches, 9 were spam (26% spam, not a great rule)

2. Message-Id tests

   (idea for test from
   http://linuxconf.unixtech.be/configurations/mutt/mutt.color.index.html)

   As the author notes, might be good to also check the RFC.

   Key:

   TEST = the rule
   MATCH = number matched out of 1789 messages
   MSG = number of messages already flagged by current SA Message-Id tests
   BAD = number of spam in MATCH
   GOOD = number of non-spam in MATCH
   RESULT = my assessment of the test

   (sorted by RESULT)

   TEST                  MATCH   MSG     BAD     GOOD   RESULT
   =~ /[{:%#|/]/         23      2       22      1      great test
   =~ /[.]>/             2       0       2       0      good test
   !~ /@.*[.]/           163     11      87      76     so-so test
   =~ /@>/               8       8       8       0      duplicates existing
   =~ /<.*</             1       1       1       0      duplicates existing
   !~ /@/                2       2       2       0      duplicates existing
   !~ /</                12      12      12      0      duplicates existing
   =~ /<>/               0       0       0       0      bad test
   =~ /<.* .*>/          29      29      3       26     bad test
   =~ /localhost/        65      0       1       64     bad test
   =~ /localdomain/      78      0       0       78     bad test
   =~ /[.][a-z]>/        0       0       0       0      bad test
   =~ /[.][a-z]{4,}>/    76      0       2       74     bad test

   I further revised the first test to be simply:

     Message-Id =~ /[#/,:]/

   which removed the false positive.  The '%' is used by a large
   corporation.  I never saw '|' or '{', so I removed them for now.

3. Lots of "X-" headers.

   (idea for test from http://silenroc.com/angel/filter2.html)

   3 or more: 593 matched, 142 spam
   4 or more: 370 matched, 55 spam
   5 or more: 194 matched, 18 spam

   Hmmm... this does not seem to be working.  What about the reverse?

   none: 341 matched, 202 spam
   1 or less: 900 matched, 258 spam

   Weird.  The no "X-" header test is not too bad.  Worth trying, I
   think.

   I now believe I actually misinterpreted the web page, it was
   supposed to be "X-x", but we already have a test for that and it
   seems to work okay.  But, the no "X-" header test might be worth
   trying.

Dan

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to