From: Alex <mysqlstud...@gmail.com> Date: Sat, 9 Jan 2010 21:13:24 -0500 > sa-learn --dump magic gives: > 0.000 0 3 0 non-token data: bayes db version > 0.000 0 57538 0 non-token data: nspam > 0.000 0 74876 0 non-token data: nham > 0.000 0 166338 0 non-token data: ntokens > 0.000 0 1257478501 0 non-token data: oldest atime > 0.000 0 1263049426 0 non-token data: newest atime > 0.000 0 1263049538 0 non-token data: last journal sync atime > 0.000 0 1263044805 0 non-token data: last expiry atime > 0.000 0 5529600 0 non-token data: last expire atime delta > 0.000 0 1868 0 non-token data: last expire reduction count > > Your database has 166338 tokens which is larger than the default > bayes_expiry_max_db_size 150000. The last expiration ran this morning > at 8:46. You could try letting the bayes database get larger and turn > off bayes_auto_expire. If you turn off bayes_auto_expire you'll have > to add something to cron to periodically expire tokens. > bayes_auto_expire is fine for lower volumes of email, but can get in > the way with higher volumes. Also, what is the drawback with using auto_expire on larger volumes? Is it the locking delay and preventing learning new messages during that time? If you were to put it in cron to manually do an expiry, how often should it be run? You have an exclusive lock when doing expiration. Expiration presumably takes longer on larger volumes, but it is still pretty fast. Running expiration daily or weekly should be more than sufficient.
Is there anything that should be tested prior to making this change, or is it pretty benign? Yes - turning off bayes_auto_expire is pretty benign. You may not need to make this type of change. The default options for bayes work fine for lower email volumes. I suppose you could take the ntokens value before, and subtract it from the after value to see how many tokens were expired, right? It would be interesting to see how many tokens are expired on a regular basis, but not sure that's very useful, just interesting. sa-learn tells how many tokens were deleted you when you do --force-expire, for example: expired old bayes database entries in 152 seconds 1516428 entries kept, 115692 deleted token frequency: 1-occurrence tokens: 73.76% token frequency: less than 8 occurrences: 16.19% -jeff