What sort of guidelines/rules of thumb/formulas have people used to determine the bayes_expiry_max_db_size setting for a sitewide bayes database?
The Mail::SpamAssassin::Conf man page says the default is 150000 tokens (which, it says, is equivalent to roughly 8mb). It seems a little extreme to simply multiply that number by the number of users on the server. 8Mb * 2000 users = ~16Gb! I'm planning on hosting this db in mysql (an SQL based bayes seems better suited than the default "file based" option for a sitewide DB), but clearly 16Gb is just too big.... Presumably something smaller would work well enough, but how small is too small and how big is too big? The only advice I've found in the list archives is: http://marc.theaimsgroup.com/?l=spamassassin-users&m=109033803207027&w=2 > How big are your bayes_* files on disk? I would say personally > that a single-user set of Bayes files shouldn't be much more than > 8-10M total; a medium-size site Bayes should be ~40M _toks + > whatever _seen takes up; and a large sitewide Bayes may run up to > ~100M. I wouldn't go much higher due to the IO/memory/filesystem > cache load. Thanks! Ben