Faisal N Jawdat wrote: > > if sa-learn already does this internally then it's doing it rather > inefficiently. 20 seconds to pull a message id and compare it against > the db (berkeleydb, fwiw)? > Ok, I just did some testing. Something is *VERY* wrong with your system.. Are you running out of ram and swapping?
Either that or you've got a lot of mail comming in and you're waiting a long time for the bayes DB to be unlocked. 20 seconds does sound about right for a bayes-lock timeout... Learning one message should take less than a second. Here's my test results, using SA 3.1.8, freshly downloaded. The machine used is an Athlon 64 3200+ (2ghz) with 2GB of DDR ram. The hard drive used is an IDE drive, Maxtor 94610U6. The Operating System was fedora Core 4. The system was not doing anything else at the time, no background SA processes, etc. For repeatability, I used part of the public corpus: http://spamassassin.apache.org/publiccorpus/20030228_easy_ham.tar.bz2 Single message tests: ---------------------- Prestep: rm ~/.spamassassin/bayes* Message used: 01251 First run of sa-learn --nonspam Real 0.841sec User 0.716 sec Sys 0.047 sec Learned 1 of 1 messages. real speed: 0.841 seconds per message Second run Real 0.770sec User 0.682 sec Sys 0.047 sec Learned 0 of 1 messages. real speed: 0.77 seconds per message Whole batch tests: ---------------------- Prestep: rm ~/.spamassassin/bayes* message used: whole directory of 2501 messages First run of sa-learn --nonspam easy_ham/ Real 3m 11.445 sec User 2m 11.850 sec System 0m 5.177 sec Learned 2501 of 2501 messages. real speed: 0.07654 seconds per message Second run sa-learn --nonspam easy_ham/ Real 0m 53.926 sec User 2m 11.850 sec System 0m 0.277 sec Learned 0 of 2501 messages. real speed: 0.02156 second per message