On Wed, 25 Apr 2007, Arik Raffael Funke wrote: > I was wondering if it has any negative effects on my Bayes > database if I regularly learn all spam/ham messages via a cron > job. Sa-learn skips already learned messages. Am I thus right to > assume that apart from the relatively high CPU load there are no > drawbacks? Or should I keep a separate folder for "new" spam/ham? > > I.e. what about expiring tags, etc. Sa-learn would routinely > re-encounter 5 year-old spam...
Here's my two cents: (1) Keep your training corpus around. It will help you recover from a corrupted database and mislearning. In other words, don't delete messages once they are learned. (2) I have a SpamAssassin-SPAM and SpamAssassin-HAM folder set up for users to learn to. Periodically (monthly) I rotate them to keep the size manageable and to reduce the burden of sa-learn rescanning old messages. (3) Only give sa-learn a training folder that has been modified in the last couple of days. There is no need to have it continually scan a mailbox where nothing has changed. You may want to look at my learn script, which I run from cron.daily http://www.impsec.org/~jhardin/antispam/ -- John Hardin KA7OHZ http://www.impsec.org/~jhardin/ [EMAIL PROTECTED] FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 ----------------------------------------------------------------------- It is sadly humorous that those who are the most shrilly vocal about bemoaning the increasing violations of civil liberties by the federal government and comparing the president to Hitler are also those who are working hardest to ensure the citizens of our nation are disarmed and unable to effectively resist that same government. Who do these people think will protect them from the Jackbooted Thugs they are so worried about? ----------------------------------------------------------------------- 559 days until the Presidential Election