Jason Frisvold <xenophage0 at gmail> writes:

> I've been investigating some recent slowness issues with our mail
> servers and I noticed that the spamassassin database is getting rather
> large.  We process approximately 300,000 mails a day (or more).  The
> bayes_token database is over 1.8 Gig at the moment. (Actually, 1.8 Gig
> for the data, and 1.3 Gig for the index)

If you have fewer than 65,000 accounts you could halve the size of the
id columns.

Those folks *not* storing per-user Bayes statistics -- i.e. with only
one line in bayes_vars -- could take the id column down to one byte, a
TINYINT.

  ALTER TABLE bayes_token  MODIFY id SMALLINT UNSIGNED NOT NULL;
  ALTER TABLE bayes_expire MODIFY id SMALLINT UNSIGNED NOT NULL;
  ALTER TABLE bayes_seen   MODIFY id SMALLINT UNSIGNED NOT NULL;
  ALTER TABLE bayes_vars   MODIFY id SMALLINT UNSIGNED NOT NULL;

The last three won't recover much space, but table indexes should be
kept the same type and size.

Also, try the following.  Since you are keeping Bayes per user, your
spam_count and ham_count probably don't get very high so you can take
those down to SMALLINTs.

  SELECT MAX(ham_count), MAX(spam_count) FROM bayes_token;

  ALTER TABLE bayes_token MODIFY spam_count SMALLINT UNSIGNED NOT NULL,
                          MODIFY ham_count SMALLINT UNSIGNED NOT NULL;

That took 25% off my bayes_token data file and index.  Since MySQL
likes to keep indexes in VM, I think it was worthwhile.


-- 
Ard

Reply via email to