Hi I am running SpamAssassin version 2.63 with mimedefang v 1.438. my question is lately I have been getting a lot of spam e-mails getting thorugh To the users on the network.=20
And every night I run this command on cron sa-learn --spam -C /etc/mail/spamassassin --showdots --mbox /var/mail/bad-mail
so that it will learn the spam messages.
Step 1) don't pass arguments you don't need.. Ditch the -C parameter unless you've got a reason to use it. (You're passing it an obviously incorrect parameter. /etc/mail/spamassassin is the site rules directory, not the standard rules directory.)
In the case of sa-learn this shouldn't be a problem, but it's not a good idea to have SA parse the same config directory twice.
My question is how can I verify that spamassasin is actually learning from the emails=20 that are forwarded to the bad-mail mailbox?
First, the simple approach of learning forwarded messages is a VERY bad idea. Make sure any messages that are in that mailbox are NOT just forwarded emails.
Forwarding doesn't resend the same message, it creates a new message quoting the original. sa-learn studies the entire message, not just the body. This means the From:, To, and Received headers, the User-Agent header, Mime encodings, etc all need to be the same. Forwarding does not preserve any of this.
I know it sounds harsh, but let's face it, if you feed forwards to SA, you're not training SA to recognize spam as spam, but to recognize any and all forwarded messages from your users as spam. Highly undesirable.
The best approach I've seen is to use a "forward as attachment" feature, then run the mailbox through a script that strips attachments and feed sa-learn the stripped attachments.
Also, make sure that this cronjob runs as the same user sendmail runs as if you aren't using a global bayes_path statement. By default on most Linux boxes, cronjobs in /etc/cron/daily run as the user "cron" and sendmail runs as "root" or "mail", and they often have different home directories.