Robert Fitzpatrick wrote: > I've seen some others on the list here show reports of the different > rules and how much they hit. Most of them are quoting the ones out of the official ruleset mass-check results. Those are in the tarball under the rules directory as STATISTICS*.txt > How can I produce these reports? Generate a hand sorted set (corpus) of spam and nonspam messages and then feed them into the mass-check tool.
Note: this is really a developer tool, so its use should be considered "advanced" See also: http://wiki.apache.org/spamassassin/MassCheck > And is it > possible to produce a report like this by domain name? > By domain? Sure, create a separate corpus for each. That said, it sounds like you're thinking of using this to monitor your live mail feeds. It's impossible to produce these reports accurately on live email. You must have a hand-sorted set of spam and nonspam to work with, that way the tool knows for sure when a rule is matching spam, or nonspam. If you try to build it off a live feed and use SA's marking as the spam criteria, your statistics are useless. Any rule with a high enough score would get "perfect" results.. all the mail it matched would be spam, and no nonspam. You have, essentially, created a "self fulfilling prophecy". The higher-scoring a rule is, the more likely messages that match it will be tagged as spam, even if they're not really spam.