Jason Haar writes: > Justin Mason wrote: > > However: it's important for SpamAssassin developers and mass-checkers to > > get a "representative" feed of spam -- with all kinds of spam included -- > > so that the rules are measured against something close to reality. > On a related note, we actually *stopped* using front-line RBLs as with > them in place, we were no longer able to get true stats as to the actual > flow of Spam/Ham into our sites. Which meant that we really couldn't > tell how effective our antispam systems were being. The "broad axe" that > is RBL meant that a single mail message coming from servers may be > blocked dozens of times (as it retries), meaning that our stats would > over-represent the effectiveness of front-line RBL methods. Now we just > let it all hit SpamAssassin, and have simply upped the score on those > RBLs we used to trust to reject directly, so that the Spam doesn't get > any further. End result: no delivery changes - but better quality stats.
Yes, that's a closely related problem. Using front-line RBLs (or other SMTP-time discard tactics like an early-talker test) distorts your view of your incoming spam. Worse than that, you effectively have no way to accurately estimate FP rates -- you have to guess based on rejection figures added to the more accurate SpamAssassin-tagged corpora. > Obviously you have to have over-speced your mail servers to be able to > do this - something poor old Justin can't manage I think :-) Yeah. If I could persuade someone to donate a server just for *my* personal mail, that'd solve it, but in the meantime, not so much ;) (Actually, we recently upgraded the RAM, so it looks like it can probably cope with the volume again.) > (FYI: picking a random user of ours and looking at all Internet email > they received in Aug 2006 showed SA had >99% success rate at tagging > Spam. 85% was quarantined (scores >10/5) and the rest tagged for the > users to filter on. Also, ZERO ham misclassification - which is > something certain commercial competitors to SpamAssassin are actually > pretty bad at...) wow, that's really good! > Now if only it could deal with this storm of "VIiiagra"/"VIragra" spam > that has been sneaking in... :-) yep, working on those ;) --j.