What sort of guidelines/rules of thumb/formulas have people used to determine the bayes_expiry_max_db_size setting for a sitewide bayes database?
Personally, I'm using 200,000 on a 100 user site-wide setup. This means my bayes DB varies between 200k and 150ktokens (75% of max) instead of the default 150k-100k.
I could easily double that, but thus far have not seen much need. Being a business most of the email here is to users with more-or-less similar email patterns. If your users are widely varied in interests on the ham side you might want to expand it further.
> How big are your bayes_* files on disk? I would say personally > that a single-user set of Bayes files shouldn't be much more than > 8-10M total; a medium-size site Bayes should be ~40M _toks + > whatever _seen takes up; and a large sitewide Bayes may run up to > ~100M. I wouldn't go much higher due to the IO/memory/filesystem > cache load.
That seems like reasonable advice.. if 8MB is 150,000 tokens, 100MB would be about 1,875,000 tokens.. That's probably a good upper bound..
Looking around on the web, I see most quoted configs being between 1.5M and 100k tokens. (excluding the people trying to make it less than 100k, which won't ever work).
I also saw one guide with 20M in their config, but I'm wondering if the guide author typoed or is confused and thinks it's specified in bytes instead of tokens. That bayes DB would be 1GB..
http://www.postfix-howto.de/spamassassin_conf.html
With 2000 users I might consider starting with something in the middle, about 800k or so, and see how it goes..