Barry McLarnon <[EMAIL PROTECTED]> wrote:
On Jul 11, 2003 01:43 am, Lucas Albers wrote:
> How exactly did you determine what your hit percentage was for
> DCC,Razor and your RBL's?
...
Nothing fancy... I had the spam archived in one mbox file, so I just
used grep on that file to find the test names (such as RAZOR2_CHECK),
piped through wc -l to count the occurrences.  Then I used a
calculator. :-)  Of course, I'm counting on the fact that strings
like that are unlikely to occur in the body of a spam message.  Maybe
someone on the list has come up with a more sophisticated tool for
header analysis of SA-filtered mail?

Well, this isn't much more sophisticated, but since I use MIMEDefang to call SpamAssassin (which then calls Razor, Pyzor, and various RBLs) I use MIMEDefang's logging functions to record the list of hits for each emssage along with a label, "not_spam" "labeling_spam" or "rejecting_spam" depending on what threshold the score reaches.


I can then pull out all lines from the log that contain both "MDLOG" and "_spam" and quickly find stats by running grep -c on a rule name.

The main disadvantage of this is that it has not been hand-sorted to remove false positives.

Due to a permissions problem, Pyzor wasn't running on my system for the past few weeks, which is my I didn't have any stats at the start of the thread. Over the last 3 days, here's what I've seen:

        Total   SA Hit        SA Miss
Razor:  3259    3090  43%     169  1.4%
Pyzor:  2854    2720  39%     134  1.1%
Both:   1520    1519  21%       1  0%
Either: 4593    4291  60%     302  2.5%

SpamAssassin totals:
Identified as Spam:      7209  37%
Identified as Non-Spam: 12283  63%
Total Messages:         19492

WHAT'S MISSING:
These have *not* been checked for false negatives/positives. As far as my own mailbox goes, all the SA misses I've seen in Razor or Pyzor were spam that just didn't get enough points. Based on my own experience, I would guess that there are more false negatives than false positives.


WHAT'S INTERESTING:
* Fully 60% of mail that SpamAssassin identified as spam was found in at least one of Razor or Pyzor.
* That's a 39% improvement over using Razor alone, or a 58% increase over using Pyzor alone.
* Out of ~12,000 messages SpamAssassin marked as non-spam, only one showed up in both databases.



Kelson Vibber
SpeedGate Communications <www.speed.net>




-------------------------------------------------------
This SF.Net email sponsored by: Parasoft
Error proof Web apps, automate testing & more.
Download & eval WebKing and get a free book.
www.parasoft.com/bulletproofapps1
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to