Re: training SpamAssassin without updating bayes*

2006-03-05 Thread Theo Van Dinter
On Sat, Mar 04, 2006 at 09:56:14PM -0500, Gabriel Wachman wrote: > During training I run: > sa-learn --dbpath $WORKDIR --ham $DATADIR/$message_dir > (likewise for spam) > > During testing I run: > spamassassin -t -p $PREFSPATH $DATADIR/$message_dir You may want to look into mass-check. It's much

Re: training SpamAssassin without updating bayes*

2006-03-05 Thread mouss
Gabriel Wachman a écrit : > > Yes. I know it may sound strange from some people's perspective, but > there are good reasons we need to do it this way. We are comparing > several spam filters; in order to make claims about the performance of > any of the filters we need to evaulate a _fixed_ classi

Re: training SpamAssassin without updating bayes*

2006-03-05 Thread Theo Van Dinter
On Sat, Mar 04, 2006 at 10:50:19PM -0500, Daryl C. W. O'Shea wrote: > Even with bayes_auto_learn disabled, the tokens' atimes are still > updated. That's the way SpamAssassin works. That's what helps > SpamAssassin's bayes implementation in being effective. Well, sort of. The atime updates ar

Re: training SpamAssassin without updating bayes*

2006-03-05 Thread Gabriel Wachman
Daryl C. W. O'Shea wrote: On 04/03/06 09:56 PM, Gabriel Wachman wrote: A colleague and I are writing a paper about a spam filter he developed. We'd like to compare it against various open source filters, including SpamAssassin. The methodology we are using is to train the filter on a set of mes

Re: training SpamAssassin without updating bayes*

2006-03-05 Thread jdow
From: "mouss" <[EMAIL PROTECTED]> Gabriel Wachman a écrit : A colleague and I are writing a paper about a spam filter he developed. We'd like to compare it against various open source filters, including SpamAssassin. The methodology we are using is to train the filter on a set of messages, and

Re: training SpamAssassin without updating bayes*

2006-03-05 Thread mouss
Gabriel Wachman a écrit : > A colleague and I are writing a paper about a spam filter he developed. > We'd like to compare it against various open source filters, including > SpamAssassin. The methodology we are using is to train the filter on a > set of messages, and then test it on an independent

Re: training SpamAssassin without updating bayes*

2006-03-04 Thread Daryl C. W. O'Shea
On 04/03/06 09:56 PM, Gabriel Wachman wrote: A colleague and I are writing a paper about a spam filter he developed. We'd like to compare it against various open source filters, including SpamAssassin. The methodology we are using is to train the filter on a set of messages, and then test it on a

Re: training SpamAssassin without updating bayes*

2006-03-04 Thread Loren Wilton
> During testing, I can see spamassassin create a "bayes_journal" file and > write to it continuously. I understand this is spamassassin's way of If the journal is only growing it isn't being learned from. Typically at some point if auto-learn were enabled one of the spam mail runs would take som

training SpamAssassin without updating bayes*

2006-03-04 Thread Gabriel Wachman
A colleague and I are writing a paper about a spam filter he developed. We'd like to compare it against various open source filters, including SpamAssassin. The methodology we are using is to train the filter on a set of messages, and then test it on an independent set of messages. The key is that