Ben Poliakoff wrote:
> What sort of guidelines/rules of thumb/formulas have people used to
> determine the bayes_expiry_max_db_size setting for a sitewide bayes
> database?

Modify until it feels right.  <g>

> The Mail::SpamAssassin::Conf man page says the default is 150000
> tokens (which, it says, is equivalent to roughly 8mb).  It seems a
> little extreme to simply multiply that number by the number of users
> on the server.
>     8Mb * 2000 users = ~16Gb!

Definitely extreme.  Keep in mind that on a global Bayes DB, there will
be a LOT of similar/identical tokens comparing per-user mail flow with
systemwide information.  Even in an ISP environment.

> I'm planning on hosting this db in mysql (an SQL based bayes seems
> better suited than the default "file based" option for a sitewide
> DB),

*shrug*  I've been running with a file-based global Bayes for several
years on a number of systems now.  Other than inherent "not enough
CPU/memory/disk/IO" problems with one system generally, I've had zero
trouble.  (Bayes, in and of itself, is a pretty trival load on that
system AFAICS.)

> The only advice I've found in the list archives is:
>http://marc.theaimsgroup.com/?l=spamassassin-users&m=109033803207027&w=2
>     > How big are your bayes_* files on disk?  I would say personally
>     > that a single-user set of Bayes files shouldn't be much more
>     > than 8-10M total; a medium-size site Bayes should be ~40M _toks
>     > + whatever _seen takes up; and a large sitewide Bayes may run
>     > up to ~100M.  I wouldn't go much higher due to the
>     > IO/memory/filesystem cache load.

That's probably about the best specific advice you'll get.  ;)

I'd start by trying 1M tokens (~40M in my setup), and see how the mail
flow looks.  If Bayes seems to be getting into trouble, bump it up to
2M, and so on.  Smaller is better for I/O, but bigger is better for
accuracy.  Exactly where the sweet spot is will vary depending on your
mail flow and your particular server setup.

-kgd
-- 
Get your mouse off of there!  You don't know where that email has been!

Reply via email to