Hi all,

I was recently given a list of 10,000 posts from an internet forum.
Out of those, 9,000 had been aproved by the site's moderators and the
remaining were rejected. I was wondering if I could use this data set
to play with Bayesian filtering in spamassassin. I tried the
following: I converted all posts to emails and then I used sa-learn
with --ham and --spam to train spamassassin. This seems to have worked
fine (it produced some files in ~/.spamassassin, having a total size
of 1MB). Now I am trying to test some new posts to see if spamassassin
thinks they should be aproved or not. I have written the following
perl code

    my $spamassassin=Mail::SpamAssassin->new({
        require_rules      => 1,
        local_tests_only   => 1,
        userprefs_filename => "$ENV{HOME}/.spamassassin/user_prefs",
        userstate_dir      => "$ENV{HOME}/.spamassassin",
        rules_filename     => "$ENV{HOME}/.spamassassin/user_prefs",
    });
    my $status = $spamassassin->check($post);
    print $status->get_score,"\n";

When I run this, it always returns zero. Here's how my
~/.spamassassin/user_prefs looks like:

    required_score    5
    use_learner       1
    use_bayes         1
    use_bayes_rules   1
    bayes_auto_learn  0
    allow_user_rules  1
    score BAYES_05    9

Could someone give me any pointers on how to make this work? All I
want is to be able to use Bayesian filtering and Bayesian filtering
alone, without any other rules. Is there any document that describes
how to do something like that? All I could find is documents
describing how to make spamassassin work with other programs like
procmail/qmail etc.

Reply via email to