[SAtalk] Bayes configuration questions

Abigail Marshall Wed, 24 Sep 2003 21:00:39 -0700

Here's the issue:

System: Running SA 2.54, FreeBSD Unix, Berkeley DB 1.85
(Hash, version 2):


Problem: When bayes_toks grows to more than 5K, it becomes
corrupted during sa-learn and ultimately trashed or lost.

My solution: Set bayes_expiry_max_db_size to lower level to
force expiry, so that bayes_toks doesn't grow too large.

I did not make changes to configuration to
bayes_expiry_min_db_size or to bayes_expiry_scan_count

Questions/Problems:

1. Why did bayes_toks grow to more than 5k in the first
place?

Documentation for SA 2.5x (sa-learn.html) says:

> Once it hits 5000 bytes, the bayes_toks database is
> locked, and the message counter entry in that database is
> increased accordingly.

2. What is default configuration for
bayes_expiry_max_db_size for SA 2.5x and how large should
the resulting file be?

Through experimentation, I have ended up with a setting:

bayes_expiry_max_db_size        150000

With this setting, bayes_toks never gets any larger than
2,556 kb.

According to documentation for SA 2.6 (sa-learn.txt)

> "bayes_expiry_max_db_size" specifies both the auto-expire token count
>  point, as well as the resulting number of tokens after expiry as
>  described above. The default value is 150,000, which is roughly
>  equivalent to a 6Mb database file if you're using DB_File.

Note that my setting is the SAME as the default for 2.6 -
but rather than a 6Mb db file, I end up with a 2.5 Mb file,
with a Bayes corpus of ~2800 or less.

Documentation for SA 2.5 (sa-learn.html) says:

> bayes_expiry_min_db_size is part of the SpamAssassin
> configuration. The default value is 100000, which is
> roughly equivalent to a 5Mb database file if you're using
> DB_File.

So here is where I am totally confused:  from what I can
tell, my setting of bayes_expiry_max_db_size=150000 should
either have no effect whatsoever, or it should leave me with
a bayes_toks file that will grow to 5K - and I end up with a
file half that size.

NOTE: Bayes works fine for me this way, but my guess is that
with the small corpus and short expiry cycle I may see
erratic performance over time. Because my system seems to
get much more spam than ham, I end up with a 9:2
ratio of spam to ham as autolearn continues to feed incoming
email to the database.

But my main concern right now is trying to figure out why
my experience doesn't match what the documentation says I
should expect.

-Abigail



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

[SAtalk] Bayes configuration questions

Reply via email to