Re: training SpamAssassin without updating bayes*

Gabriel Wachman Sun, 05 Mar 2006 08:24:01 -0800

Daryl C. W. O'Shea wrote:

On 04/03/06 09:56 PM, Gabriel Wachman wrote:

A colleague and I are writing a paper about a spam filter he developed.
We'd like to compare it against various open source filters, including
SpamAssassin. The methodology we are using is to train the filter on a
set of messages, and then test it on an independent set of messages. The
key is that the filter cannot update itself at all after training.



That's the key?!

Yes. I know it may sound strange from some people's perspective, butthere are good reasons we need to do it this way. We are comparingseveral spam filters; in order to make claims about the performance ofany of the filters we need to evaulate a _fixed_ classifier on a testset. If the classifier is not fixed, then our confidence intervals goout the window. It actually helps SpamAssassin if we can do this becauseif we can't, we need to mention in the paper that any results fromSpamAssassin are not statistically robust since it changes itsclassifier during training. Since SpamAssassin is so widely used and inmy experience performs very well, we would really like to includeresults from it without any such caveats.

I hope that helps explain the situation. Regardless, our testingmethodology is really not up for discussion, I just want to know ifthere is an easy way to do what we want.


Thank you all for your replies.

Gabriel

Re: training SpamAssassin without updating bayes*

Reply via email to