From: Alex <mysqlstud...@gmail.com>
   Date: Sat, 9 Jan 2010 21:13:24 -0500
   
   >   sa-learn --dump magic gives:
   >       0.000          0          3          0  non-token data: bayes db 
version
   >       0.000          0      57538          0  non-token data: nspam
   >       0.000          0      74876          0  non-token data: nham
   >       0.000          0     166338          0  non-token data: ntokens
   >       0.000          0 1257478501          0  non-token data: oldest atime
   >       0.000          0 1263049426          0  non-token data: newest atime
   >       0.000          0 1263049538          0  non-token data: last journal 
sync atime
   >       0.000          0 1263044805          0  non-token data: last expiry 
atime
   >       0.000          0    5529600          0  non-token data: last expire 
atime delta
   >       0.000          0       1868          0  non-token data: last expire 
reduction count
   >
   > Your database has 166338 tokens which is larger than the default
   > bayes_expiry_max_db_size 150000.  The last expiration ran this morning
   > at 8:46.  You could try letting the bayes database get larger and turn
   > off bayes_auto_expire.  If you turn off bayes_auto_expire you'll have
   > to add something to cron to periodically expire tokens.
   > bayes_auto_expire is fine for lower volumes of email, but can get in
   > the way with higher volumes.
   
   Also, what is the drawback with using auto_expire on larger volumes?
   Is it the locking delay and preventing learning new messages during
   that time? If you were to put it in cron to manually do an expiry, how
   often should it be run?
   
You have an exclusive lock when doing expiration.  Expiration presumably
takes longer on larger volumes, but it is still pretty fast.  
Running expiration daily or weekly should be more than sufficient.

   Is there anything that should be tested prior to making this change,
   or is it pretty benign?

Yes - turning off bayes_auto_expire is pretty benign.
You may not need to make this type of change.   The default options
for bayes work fine for lower email volumes.

   I suppose you could take the ntokens value before, and subtract it
   from the after value to see how many tokens were expired, right? It
   would be interesting to see how many tokens are expired on a regular
   basis, but not sure that's very useful, just interesting.

sa-learn tells how many tokens were deleted you when you do --force-expire, for 
example:
 expired old bayes database entries in 152 seconds
 1516428 entries kept, 115692 deleted
 token frequency: 1-occurrence tokens: 73.76%
 token frequency: less than 8 occurrences: 16.19%

-jeff

Reply via email to