Faisal N Jawdat wrote:
>
> if sa-learn already does this internally then it's doing it rather
> inefficiently.  20 seconds to pull a message id and compare it against
> the db (berkeleydb, fwiw)?
>
Ok, I just did some testing. Something is *VERY* wrong with your
system.. Are you running out of ram and swapping?

Either that or you've got a lot of mail comming in and you're waiting a
long time for the bayes DB to be unlocked. 20 seconds does sound about
right for a bayes-lock timeout...

Learning one message should take less than a second.

Here's my test results, using SA 3.1.8, freshly downloaded.
The machine used is an Athlon 64 3200+ (2ghz) with 2GB of DDR ram.
The hard drive used is an IDE drive, Maxtor 94610U6.
The Operating System was fedora Core 4.
The system was not doing anything else at the time, no background SA
processes, etc.

For repeatability, I used part of the public corpus:

http://spamassassin.apache.org/publiccorpus/20030228_easy_ham.tar.bz2

Single message tests:
----------------------
Prestep:
    rm ~/.spamassassin/bayes*
Message used: 01251

First run of sa-learn --nonspam
    Real 0.841sec
    User 0.716 sec
    Sys 0.047 sec
    Learned 1 of 1 messages.
   real speed: 0.841 seconds per message

Second run
    Real 0.770sec
    User 0.682 sec
    Sys 0.047 sec
    Learned 0 of 1 messages.
  real speed: 0.77 seconds per message

Whole batch tests:
----------------------
Prestep:
    rm ~/.spamassassin/bayes*
    message used: whole directory of 2501 messages

First run of sa-learn --nonspam easy_ham/
    Real 3m 11.445 sec
    User 2m 11.850 sec
    System 0m 5.177 sec
    Learned 2501 of 2501 messages.
    real speed: 0.07654 seconds per message

Second run sa-learn --nonspam easy_ham/
    Real 0m 53.926 sec
    User 2m 11.850 sec
    System 0m 0.277 sec
    Learned 0 of 2501 messages.
    real speed: 0.02156 second per message

Reply via email to