are there some recommendations for tuning the size
(number of tokens) of a bayes db ?
i.e. are there some recommendations of optimal number of
tokens, maximum recommended number of tokens, etc ?
i currently have something like this
sa-learn --dump magic
0.000 0
2 0 non-token
data: bayes db version
0.000 0 359318
0 non-token data: nspam
0.000 0 36472
0 non-token data: nham
0.000 0 4316998
0 non-token data: ntokens
with autolearn thresholds at 0.15 for ham and 10.5 for spam, and the
database
isn't "autoexpire" (currently done via a cron-job every 4 weeks)
this makes a approx 160Mb bayes_toks file
should i "expire" it more often (last expiration run was a few weeks ago) ?
Thanks
PS : why is there much more nspam than nham...well, currently about
60% of our received internet email traffic is spam....what a waste
of resources)