On Jul 11, 2003 01:43 am, Lucas Albers wrote: > How exactly did you determine what your hit percentage was for > DCC,Razor and your RBL's? ... Nothing fancy... I had the spam archived in one mbox file, so I just used grep on that file to find the test names (such as RAZOR2_CHECK), piped through wc -l to count the occurrences. Then I used a calculator. :-) Of course, I'm counting on the fact that strings like that are unlikely to occur in the body of a spam message. Maybe someone on the list has come up with a more sophisticated tool for header analysis of SA-filtered mail?
Well, this isn't much more sophisticated, but since I use MIMEDefang to call SpamAssassin (which then calls Razor, Pyzor, and various RBLs) I use MIMEDefang's logging functions to record the list of hits for each emssage along with a label, "not_spam" "labeling_spam" or "rejecting_spam" depending on what threshold the score reaches.
I can then pull out all lines from the log that contain both "MDLOG" and "_spam" and quickly find stats by running grep -c on a rule name.
The main disadvantage of this is that it has not been hand-sorted to remove false positives.
Due to a permissions problem, Pyzor wasn't running on my system for the past few weeks, which is my I didn't have any stats at the start of the thread. Over the last 3 days, here's what I've seen:
Total SA Hit SA Miss Razor: 3259 3090 43% 169 1.4% Pyzor: 2854 2720 39% 134 1.1% Both: 1520 1519 21% 1 0% Either: 4593 4291 60% 302 2.5%
SpamAssassin totals: Identified as Spam: 7209 37% Identified as Non-Spam: 12283 63% Total Messages: 19492
WHAT'S MISSING:
These have *not* been checked for false negatives/positives. As far as my own mailbox goes, all the SA misses I've seen in Razor or Pyzor were spam that just didn't get enough points. Based on my own experience, I would guess that there are more false negatives than false positives.
WHAT'S INTERESTING:
* Fully 60% of mail that SpamAssassin identified as spam was found in at least one of Razor or Pyzor.
* That's a 39% improvement over using Razor alone, or a 58% increase over using Pyzor alone.
* Out of ~12,000 messages SpamAssassin marked as non-spam, only one showed up in both databases.
Kelson Vibber
SpeedGate Communications <www.speed.net>
------------------------------------------------------- This SF.Net email sponsored by: Parasoft Error proof Web apps, automate testing & more. Download & eval WebKing and get a free book. www.parasoft.com/bulletproofapps1 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk