I am using per-user Bayes DBs, and I'm not sure what good it's doing me. I initiated the DB with good and bad messages, and throw any false-positives and false-negatives through sa-learn. I've also taken to feeding any spam through sa-learn, too, because I thought I remembered reading that this would help reinforce which messages are bad (and it would ignore any messages it had already learned from via auto-learn, which I think is turned on).
So we've been doing this for about a year and I still have quite a number of false-negatives (i.e. spam that gets through) - over 100 per day. Maybe I don't quite understand how it's supposed to work. Here's an example: >From [EMAIL PROTECTED] Mon Jun 12 23:24:55 2006 X-Spam-Status: No, score=1.9 required=5.0 tests=ALL_TRUSTED,BAYES_99, DNS_FROM_RFC_ABUSE,HTML_MESSAGE autolearn=no version=3.1.3 Reply-To: "Programmer's Paradise" <[EMAIL PROTECTED]> From: "Programmer's Paradise" <[EMAIL PROTECTED]> Mail from this user still gets through all the time: >From [EMAIL PROTECTED] Wed Mar 21 14:23:05 2007 X-Spam-Status: No, score=1.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_99, FROM_EXCESS_QP,HTML_MESSAGE autolearn=no version=3.1.8 Reply-To: [EMAIL PROTECTED] From: "=?iso-8859-1?Q?Programmer's_Paradise?=" <[EMAIL PROTECTED]> I keep all of the spam/ham I've sent through sa-learn (why I'm not sure, but I do have it) and I have had at least 52 of these emails, yet they still get through: </home/USER/mail> # grep From.*pparadise sa-spam.done | wc -l 52 </home/USER/mail> # grep From.*pparadise sa-ham.done | wc -l 0 I also get plenty of emails with obvious variations of spellings for viagra and all of the other popular spam drugs, lots of spelling variations for various body parts and sexual acts, and they still get through. I get very few false-positives, probably 1 a month or a little less, so I'm happy in that regard. Some details: OS: FC5 (2.6.17) SpamAssassin version 3.1.8 running on Perl version 5.8.8 spamd: run via init.d script SpamAssassin is invoked from .procmailrc via: :0fw: * < 256000 | spamc sa-learn run nightly as root via cron job: su USER -s /bin/sh -c 'sa-learn --spam --mbox --showdots ~/mail/sa-spam' <~> # su USER -s /bin/sh -c 'sa-learn --dump magic' 0.000 0 3 0 non-token data: bayes db version 0.000 0 17393 0 non-token data: nspam 0.000 0 565 0 non-token data: nham 0.000 0 145811 0 non-token data: ntokens 0.000 0 1173869033 0 non-token data: oldest atime 0.000 0 1174829674 0 non-token data: newest atime 0.000 0 1174826915 0 non-token data: last journal sync atime 0.000 0 1174559910 0 non-token data: last expiry atime 0.000 0 691200 0 non-token data: last expire atime delta 0.000 0 76338 0 non-token data: last expire reduction count Any help in my understanding of what SA is supposed to do, as well as what I may be doing wrong, is much appreciated. Thanks! -- Regards, joe Joe Casadonte [EMAIL PROTECTED]