Help with per-user sa-learn

Joe Casadonte Sun, 25 Mar 2007 06:10:38 -0800

I am using per-user Bayes DBs, and I'm not sure what good it's doing
me.  I initiated the DB with good and bad messages, and throw any
false-positives and false-negatives through sa-learn.  I've also taken
to feeding any spam through sa-learn, too, because I thought I
remembered reading that this would help reinforce which messages are
bad (and it would ignore any messages it had already learned from via
auto-learn, which I think is turned on).


So we've been doing this for about a year and I still have quite a
number of false-negatives (i.e. spam that gets through) - over 100 per
day.  Maybe I don't quite understand how it's supposed to work.
Here's an example:

>From [EMAIL PROTECTED]  Mon Jun 12 23:24:55 2006
X-Spam-Status: No, score=1.9 required=5.0 tests=ALL_TRUSTED,BAYES_99,
        DNS_FROM_RFC_ABUSE,HTML_MESSAGE autolearn=no version=3.1.3
Reply-To: "Programmer's Paradise" <[EMAIL PROTECTED]>
From: "Programmer's Paradise" <[EMAIL PROTECTED]>


Mail from this user still gets through all the time:

>From [EMAIL PROTECTED]  Wed Mar 21 14:23:05 2007
X-Spam-Status: No, score=1.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_99,
        FROM_EXCESS_QP,HTML_MESSAGE autolearn=no version=3.1.8
Reply-To: [EMAIL PROTECTED]
From: "=?iso-8859-1?Q?Programmer's_Paradise?=" <[EMAIL PROTECTED]>


I keep all of the spam/ham I've sent through sa-learn (why I'm not
sure, but I do have it) and I have had at least 52 of these emails,
yet they still get through:

</home/USER/mail> # grep From.*pparadise sa-spam.done | wc -l
52

</home/USER/mail> # grep From.*pparadise sa-ham.done | wc -l
0


I also get plenty of emails with obvious variations of spellings for
viagra and all of the other popular spam drugs, lots of spelling
variations for various body parts and sexual acts, and they still get
through.  I get very few false-positives, probably 1 a month or a
little less, so I'm happy in that regard.


Some details:

OS: FC5 (2.6.17)

SpamAssassin version 3.1.8
  running on Perl version 5.8.8

spamd: run via init.d script


SpamAssassin is invoked from .procmailrc via:
:0fw:
* < 256000
| spamc


sa-learn run nightly as root via cron job:

su USER -s /bin/sh -c 'sa-learn --spam --mbox --showdots ~/mail/sa-spam'


<~> # su USER -s /bin/sh -c 'sa-learn --dump magic'
0.000          0          3          0  non-token data: bayes db version
0.000          0      17393          0  non-token data: nspam
0.000          0        565          0  non-token data: nham
0.000          0     145811          0  non-token data: ntokens
0.000          0 1173869033          0  non-token data: oldest atime
0.000          0 1174829674          0  non-token data: newest atime
0.000          0 1174826915          0  non-token data: last journal sync atime
0.000          0 1174559910          0  non-token data: last expiry atime
0.000          0     691200          0  non-token data: last expire atime delta
0.000          0      76338          0  non-token data: last expire reduction 
count


Any help in my understanding of what SA is supposed to do, as well as
what I may be doing wrong, is much appreciated.  Thanks!

--
Regards,


joe
Joe Casadonte
[EMAIL PROTECTED]

Help with per-user sa-learn

Reply via email to