Re: Any drawbacks of cron-scheduled bayesian leanring?

Faisal N Jawdat Wed, 25 Apr 2007 09:43:05 -0700

On Apr 25, 2007, at 5:49 AM, Arik Raffael Funke wrote:

I was wondering if it has any negative effects on my Bayes databaseif I regularly learn all spam/ham messages via a cron job.
Sa-learn skips already learned messages. Am I thus right to assumethat apart from the relatively high CPU load there are nodrawbacks? Or should I keep a separate folder for "new" spam/ham?


I did this for a while and didn't find any problems.

I'm using Maildir, and I only trained on the cur folders, not the newfolders. In theory this would prevent me from training on somethingthat had come in mis-filed (so long as I remembered to quit my mailclient at night).


See here for details and a script to do this:

http://www.faisal.com/software/sa-harvest/

Note that this script will also attempt to rebuild your whitelist(all the code after the 'sa-learn --dump magic'). This has somedownsides, and turns out to be less useful with modern Spamassassin,so I'm reworking the script to break out the whitelist code into aseparate script.

That said, I keep a rolling 1 month corpus of spam, so it's easy toretrain when I need to. I stopped doing full retrains on cron, andat this point I only retrain on messages that were misfiled. See:


http://www.faisal.com/software/sa-harvest/quicktrain.xhtml

If you're doing any of this on a shared system, my one bit of adviceis to set up the cron to use 'batch' and 'nice'.


-faisal

Re: Any drawbacks of cron-scheduled bayesian leanring?

Reply via email to