Hi all, I was recently given a list of 10,000 posts from an internet forum. Out of those, 9,000 had been aproved by the site's moderators and the remaining were rejected. I was wondering if I could use this data set to play with Bayesian filtering in spamassassin. I tried the following: I converted all posts to emails and then I used sa-learn with --ham and --spam to train spamassassin. This seems to have worked fine (it produced some files in ~/.spamassassin, having a total size of 1MB). Now I am trying to test some new posts to see if spamassassin thinks they should be aproved or not. I have written the following perl code
my $spamassassin=Mail::SpamAssassin->new({ require_rules => 1, local_tests_only => 1, userprefs_filename => "$ENV{HOME}/.spamassassin/user_prefs", userstate_dir => "$ENV{HOME}/.spamassassin", rules_filename => "$ENV{HOME}/.spamassassin/user_prefs", }); my $status = $spamassassin->check($post); print $status->get_score,"\n"; When I run this, it always returns zero. Here's how my ~/.spamassassin/user_prefs looks like: required_score 5 use_learner 1 use_bayes 1 use_bayes_rules 1 bayes_auto_learn 0 allow_user_rules 1 score BAYES_05 9 Could someone give me any pointers on how to make this work? All I want is to be able to use Bayesian filtering and Bayesian filtering alone, without any other rules. Is there any document that describes how to do something like that? All I could find is documents describing how to make spamassassin work with other programs like procmail/qmail etc.