A colleague and I are writing a paper about a spam filter he developed. We'd like to compare it against various open source filters, including SpamAssassin. The methodology we are using is to train the filter on a set of messages, and then test it on an independent set of messages. The key is that the filter cannot update itself at all after training.
In my user_prefs: bayes_auto_learn 0 bayes_learn_during_report 0 bayes_path SOME_PATH During training I run: sa-learn --dbpath $WORKDIR --ham $DATADIR/$message_dir (likewise for spam) During testing I run: spamassassin -t -p $PREFSPATH $DATADIR/$message_dir I'm running several testing and training runs, so for each one I specify a different database (by setting "SOME_PATH" appropriately and specifying that "user_prefs" using the -p switch), hence the variables for certain command-line arguments. The matching testing run for a given training run must read the bayes_* files from that training run. During testing, I can see spamassassin create a "bayes_journal" file and write to it continuously. I understand this is spamassassin's way of storing its updates to bayes_* temporarily until the updates are merged. My concern is that it's using bayes_journal in addition to bayes_toks and bayes_seen during testing, but I just want it to use the bayes_toks and bayes_seen generating during training. Can someone tell me how to run spamassassin in testing mode, without making any updates or doing any learning, but only classifying messages? Thank you, Gabriel