On Wed, 25 Apr 2007, Arik Raffael Funke wrote:

> I was wondering if it has any negative effects on my Bayes
> database if I regularly learn all spam/ham messages via a cron
> job. Sa-learn skips already learned messages. Am I thus right to
> assume that apart from the relatively high CPU load there are no
> drawbacks? Or should I keep a separate folder for "new" spam/ham?
> 
> I.e. what about expiring tags, etc. Sa-learn would routinely
> re-encounter 5 year-old spam...

Here's my two cents:

(1) Keep your training corpus around. It will help you recover from a
corrupted database and mislearning. In other words, don't delete
messages once they are learned.

(2) I have a SpamAssassin-SPAM and SpamAssassin-HAM folder set up for
users to learn to. Periodically (monthly) I rotate them to keep the
size manageable and to reduce the burden of sa-learn rescanning old
messages.

(3) Only give sa-learn a training folder that has been modified in the 
last couple of days. There is no need to have it continually scan a 
mailbox where nothing has changed.

You may want to look at my learn script, which I run from cron.daily

  http://www.impsec.org/~jhardin/antispam/


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]    FALaholic #11174     pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  It is sadly humorous that those who are the most shrilly vocal
  about bemoaning the increasing violations of civil liberties by
  the federal government and comparing the president to Hitler are
  also those who are working hardest to ensure the citizens of our
  nation are disarmed and unable to effectively resist that same
  government. Who do these people think will protect them from the
  Jackbooted Thugs they are so worried about?
-----------------------------------------------------------------------
 559 days until the Presidential Election

Reply via email to