On Sat, 09 Jan 2010 16:24:56 +0100
Cecil Westerhof <ce...@decebal.nl> wrote:

> Jeff Mincy <j...@delphioutpost.com> writes:
> 
> >    I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn
> > takes more time with 3.2.5 as it took with 3.0.4. Can this be true?
> >    
> >    It is not a problem, because it is done by cron-tab, but I am
> > just curious.
> >
> > You can use spamc -L spam/ham to learn messages.  Spamc -L is faster
> > than sa-learn.  The spamd daemon needs to be started with
> > --allow-tell.
> 
> That is not really an answer on my question. ;-)
> 
> ...
> So sa-learn takes 15 seconds and spamc -L 20 seconds. (And I need more
> code. Beside taking care of an empty directory, I also need to
> implement the feedback given by sa-learn.)

It's not really surprising sa-learn doesn't have the problem of having
to initialize for each individually mail, so spamc is just extra
overhead. 

> > You can try using bayes_learn_to_journal - and do a separate
> > sa-learn --sync job in cron.   Learning to the journal is faster.
> 
> I'll look into that.

I wouldn't bother setting that just to speed-up learning. AFAIK the
point of bayes_learn_to_journal is to prevent autolearning from
slowing-down classification. The gdbm backend uses a simple
reader-writer lock, so updating token counts locks-out all the other
spamd processes from the database. If you have enough active spamd
processes to justify it, updating to the journal avoids lock
contention. The downside is that the updates don't take effect until
the sync.

My guess is that it doesn't really speed-up learning it just defers
some of the work until sync, and there's not much point in that,
since you could just defer the sa-learn.

Reply via email to