Re: [SAtalk] STATISTICS.txt

Daniel Quinlan Fri, 26 Sep 2003 00:12:53 -0700

"Dallas L. Engelken" <[EMAIL PROTECTED]> writes:

> Can anyone help explain the STATISTICS.txt files a little deeper to me?
> 
> STATISTICS.txt - rules
> STATISTICS-set1.txt - rules + network tests
> STATISTICS-set2.txt - rules + bayes
> STATISTICS-set3.txt - rules + bayes + network tests
> 
> this is what it looks like, but the false positives are much smaller in
> STATISTICS-set1 than in STATISTICS.txt... so i dont see how adding
> network tests can reduce false positives... the 50_rules.cf greatly
> differ in size from STATISTICS.txt and STATISTICS-set1.txt, so i was
> wonder what else is different to account for the reduction in false
> positives?


Each set is separately tuned by the genetic algorithm.  (The size of the
files is not really related.)

Basically, with more tests, SpamAssassin can do a better job optimizing
the scores and rely less on other less accurate tests.  In other words,
RBLs and other network tests do work and they don't cause more false
positives ... in the overall scheme of things.  Yes, sometimes, they are
part of the odd (and hopefully extremely rare) false positive, but
hopefully no more than any other tests.

Of course, that's until a blacklist changes policy (read: Monkeys) or
goes under while designating the entire internet as a spammer IP address
(read: Osirusoft).

We try our best to avoid blacklists that are questionable or seem apt to
change policy without adequate warning (like a year).

Daniel


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] STATISTICS.txt

Reply via email to