On Fri, Sep 19, 2003 at 04:38:40PM -0400, Pete O'Hara wrote:
> Yes, I figured that if for some reason the 50k was too low that I should 
> endup with 100k, but I here I have 165k and this is what is confusing me.
> 0.000          0     165010          0  non-token data: ntokens

Just remember, it's all "best effort", so you may end up with >
max_db_size, or < 100k, depending on how the calculations go, but the
code does it's best (surprise!) not to do that. ;)

> what would cause an old lock file and a bayes_toks.new that is
> static (not being written to and just hanging around)? - I have seen
> users with memory problems that cause this but they seem to have
> mail problems and database access issues that I don't have - the logs
> show that BAYES_XX tests are being utilized

something dying during an expire or import.

> -- I believe auto_expiry but how do I know for sure (bayes_auto_expire 1 in
> -- /etc/mail/spamassassin/local.cf - which is being read - see below) 
> -- but it's not expiring AFAIK. I have bayes_expiry_max_db_size 50000. I know
> -- that with such a small size the result should be 100,000 tokens

irrelevent.  auto_expire occurs because it appears in the -D output.
whether or not it does anything is a different issue.

> -- have "bayes_expiry_max_db_size 50000" to try to force an auto expire 
> debug: bayes: expiry check keep size, 75% of max: 37500
> debug: bayes: expiry keep size too small, resetting to 100,000 tokens
> debug: bayes: token count: 165454, final goal reduction size: 65454
> debug: bayes: First pass?  Current: 1063914795, Last: 1063725493, atime: 1382400, 
> count: 66791, newdelta: 1410637, ratio: 1.02042655911021
> -- why were 155616 tokens kept? should have been 100,000 I thought
> debug: expired old Bayes database entries in 84 seconds: 155616 entries kept, 9838 
> deleted

This is explained in the sa-learn docs, but in short...  2.6x's expiry
code tries to be efficient (and time-saving) by estimating time deltas
based on the previous expire run, on the assumption that your mail flow
will be semi-constant and therefore expires will be on roughly the same
number of tokens with roughly the same time delta.

The "First pass?" line shows the values used to figure out if an
estimation is likely to work or not.  I'm not going to go into specifics
(see sa-learn's poddoc's EXPIRATION section), but with the values listed
above, SA decided that it can estimate based on the previous expiry.
(note: I just added a debug statement to the expire code to say that it
will use estimation and not run a first pass.)  By doing so, however,
it was only able to remove 9838 tokens, leaving 155616.  ie: you learned
a lot less tokens in the last 2 weeks than the 2 weeks previous to that.

The estimation, btw, calculated that atime*count/goal_reduction
(1382400*66791/65454) gave a new atime delta estimate of 1410637, or
approximately 16 days.


Hope this helps. :)

-- 
Randomly Generated Tagline:
Condominiums are not effective birth control.

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to