
are there some recommendations for tuning the size
(number of tokens) of a bayes db ?

i.e. are there some recommendations of optimal number of
tokens, maximum recommended number of tokens, etc ?

i currently have something like this

sa-learn --dump magic
0.000  0          2          0  non-token data: bayes db version
0.000  0     359318          0  non-token data: nspam
0.000  0      36472          0  non-token data: nham
0.000  0    4316998          0  non-token data: ntokens

with autolearn thresholds at 0.15 for ham and 10.5 for spam, and the database
isn't "autoexpire" (currently done via a cron-job every 4 weeks)

this makes a approx 160Mb bayes_toks file

should i "expire" it more often (last expiration run was a few weeks ago) ?


PS : why is there much more nspam than nham...well, currently about
60% of our received internet email traffic is spam....what a waste of resources)

Reply via email to