"Dallas L. Engelken" <[EMAIL PROTECTED]> writes: > Can anyone help explain the STATISTICS.txt files a little deeper to me? > > STATISTICS.txt - rules > STATISTICS-set1.txt - rules + network tests > STATISTICS-set2.txt - rules + bayes > STATISTICS-set3.txt - rules + bayes + network tests > > this is what it looks like, but the false positives are much smaller in > STATISTICS-set1 than in STATISTICS.txt... so i dont see how adding > network tests can reduce false positives... the 50_rules.cf greatly > differ in size from STATISTICS.txt and STATISTICS-set1.txt, so i was > wonder what else is different to account for the reduction in false > positives?
Each set is separately tuned by the genetic algorithm. (The size of the files is not really related.) Basically, with more tests, SpamAssassin can do a better job optimizing the scores and rely less on other less accurate tests. In other words, RBLs and other network tests do work and they don't cause more false positives ... in the overall scheme of things. Yes, sometimes, they are part of the odd (and hopefully extremely rare) false positive, but hopefully no more than any other tests. Of course, that's until a blacklist changes policy (read: Monkeys) or goes under while designating the entire internet as a spammer IP address (read: Osirusoft). We try our best to avoid blacklists that are questionable or seem apt to change policy without adequate warning (like a year). Daniel ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk