Robert Fitzpatrick wrote:
> I've seen some others on the list here show reports of the different
> rules and how much they hit. 
Most of them are quoting the ones out of the official ruleset mass-check
results. Those are in the tarball under the rules directory as
STATISTICS*.txt
> How can I produce these reports?
Generate a hand sorted set (corpus) of spam and nonspam messages and
then feed them into the mass-check tool.

Note: this is really a developer tool, so its use should be considered
"advanced"

See also:
http://wiki.apache.org/spamassassin/MassCheck


>  And is it
> possible to produce a report like this by domain name?
>   
By domain? Sure, create a separate corpus for each.

That said, it sounds like you're thinking of using this to monitor your
live mail feeds.

It's impossible to produce these reports accurately on live email. You
must have a hand-sorted set of spam and nonspam to work with, that way
the tool knows for sure when a rule is matching spam, or nonspam.

If you try to build it off a live feed and use SA's marking as the spam
criteria, your statistics are useless. Any rule with a high enough score
would get "perfect" results.. all the mail it matched would be spam, and
no nonspam. You have, essentially, created a "self fulfilling prophecy".
The higher-scoring a rule is, the more likely messages that match it
will be tagged as spam, even if they're not really spam.

Reply via email to