From: Cecil Westerhof <ce...@decebal.nl>
   Date: Sat, 09 Jan 2010 16:24:56 +0100
   
   Jeff Mincy <j...@delphioutpost.com> writes:
   
   >    I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn takes
   >    more time with 3.2.5 as it took with 3.0.4. Can this be true?
   >    
   >    It is not a problem, because it is done by cron-tab, but I am just
   >    curious.
   >
   > You can use spamc -L spam/ham to learn messages.  Spamc -L is faster
   > than sa-learn.  The spamd daemon needs to be started with
   > --allow-tell.
   
   That is not really an answer on my question. ;-)

I doubt that bayes learning has slowed down significantly.
I would expect that choice of bayes_store_module, learning to
journal, whether auto expiration runs, and lock contention
matters more than the version.

   But it does not seem to be interesting in my situation.
   First my code has to grow from:
       sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
   to:
       for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
           spamc -L ${typeStr} <${i}
       done
   
   Which is not even enough, because I need to take care of the situation
   that the directory is empty and I need to implement code to show the
   messages delivered by sa-learn.

Oh.  You're learning all of the messages in a directory.  spamc -L is
faster than sa-learn for learning single messages because sa-learn is
a perl script that has to load Mail::SpamAssassin each time.  For a
large directory the slower startup of sa-learn is less of an issue.
sa-learn is fine for doing directories.

   Which a low level of spam it work, but if it becomes bigger, it does not
   work:
       date
       echo ${echoStr}
       sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
       date
       for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
           spamc -L ${typeStr} <${i}
       done
       echo learned in the new way
       date
   gives:
       za jan  9 16:09:25 CET 2010
       Increase
       Learned tokens from 0 message(s) (45 message(s) examined)
       za jan  9 16:09:40 CET 2010
       learned in the new way
       za jan  9 16:10:00 CET 2010
   
   So sa-learn takes 15 seconds and spamc -L 20 seconds. (And I need more
   code. Beside taking care of an empty directory, I also need to implement
   the feedback given by sa-learn.)
   
You learned tokens from 0 messages and looked at 45 messages.
You've already previously learned from those 45 messages, which is
just timing how fast it can do nothing.

   > You can try using bayes_learn_to_journal - and do a separate sa-learn
   > --sync job in cron.   Learning to the journal is faster.
   
   I'll look into that.
   
   
   > Also, What is the size of your database?   Maybe you are spending lots
   > of time doing expires or something.
   
   sa-learn --dump magic gives:
       0.000          0          3          0  non-token data: bayes db version
       0.000          0      57538          0  non-token data: nspam
       0.000          0      74876          0  non-token data: nham
       0.000          0     166338          0  non-token data: ntokens
       0.000          0 1257478501          0  non-token data: oldest atime
       0.000          0 1263049426          0  non-token data: newest atime
       0.000          0 1263049538          0  non-token data: last journal 
sync atime
       0.000          0 1263044805          0  non-token data: last expiry atime
       0.000          0    5529600          0  non-token data: last expire 
atime delta
       0.000          0       1868          0  non-token data: last expire 
reduction count
   
Your database has 166338 tokens which is larger than the default
bayes_expiry_max_db_size 150000.  The last expiration ran this morning
at 8:46.  You could try letting the bayes database get larger and turn
off bayes_auto_expire.  If you turn off bayes_auto_expire you'll have
to add something to cron to periodically expire tokens.
bayes_auto_expire is fine for lower volumes of email, but can get in
the way with higher volumes.
-jeff

Reply via email to