On Sat, 09 Jan 2010 16:24:56 +0100 Cecil Westerhof <ce...@decebal.nl> wrote:
> Jeff Mincy <j...@delphioutpost.com> writes: > > > I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn > > takes more time with 3.2.5 as it took with 3.0.4. Can this be true? > > > > It is not a problem, because it is done by cron-tab, but I am > > just curious. > > > > You can use spamc -L spam/ham to learn messages. Spamc -L is faster > > than sa-learn. The spamd daemon needs to be started with > > --allow-tell. > > That is not really an answer on my question. ;-) > > ... > So sa-learn takes 15 seconds and spamc -L 20 seconds. (And I need more > code. Beside taking care of an empty directory, I also need to > implement the feedback given by sa-learn.) It's not really surprising sa-learn doesn't have the problem of having to initialize for each individually mail, so spamc is just extra overhead. > > You can try using bayes_learn_to_journal - and do a separate > > sa-learn --sync job in cron. Learning to the journal is faster. > > I'll look into that. I wouldn't bother setting that just to speed-up learning. AFAIK the point of bayes_learn_to_journal is to prevent autolearning from slowing-down classification. The gdbm backend uses a simple reader-writer lock, so updating token counts locks-out all the other spamd processes from the database. If you have enough active spamd processes to justify it, updating to the journal avoids lock contention. The downside is that the updates don't take effect until the sync. My guess is that it doesn't really speed-up learning it just defers some of the work until sync, and there's not much point in that, since you could just defer the sa-learn.