Re: [SAtalk] Bayes problems and expiring

Theo Van Dinter Tue, 07 Oct 2003 09:58:57 -0700

On Thu, Oct 02, 2003 at 09:43:01AM +0300, Kai Risku wrote:
> debug: bayes: expiry check keep size, 75% of max: 112500
> debug: bayes: token count: 161438, final goal reduction size: 48938
> debug: bayes: First pass?  Current: 1065075477, Last: 1065043573, atime: 0,
> count: 0, newdelta: 0, ratio: 0
> debug: bayes: something fishy, calculating atime (first pass)
> debug: bayes: couldn't find a good delta atime, need more token difference,
> skipping expire.
> 
> It seems like the expiry code is having some kinds of problems here. The
> database has been slowly accumulated from a live feed of emails for a
> duration of several weeks. Here is the beginning of "sa-learn --dump" :


Yes.  Expiry is a best effort attempt, not a guarantee of max size.
I wrote in lots of detail into the sa-learn docs.  In this case, look
at the "ESTIMATION PASS LOGIC" section.

In short, the expiry system can't find an appropriate time unit that
can be used without either too many tokens being expired, or less than
1000 tokens being expired.  So it wants more token atime differences so
the next time it tries to expire it may have a shot at something between
1000 and "too many".

> Can anybody shed some light on whether this is correct behaviour or not? My
> main problem is slowness when using the Bayes checks with timeouts
> occurring, and it might have to do with too large a database. Any other
> explanations are also welcome!

# of tokens shouldn't have an effect on speed.  At least, I haven't seen
that happen.  I have ~400k tokens # in my DB at any point, and it runs
just as speedy as when 100k tokens were in there.

-- 
Randomly Generated Tagline:
"You can start by removing your clothes.
  Not without flowers and dinner."       - Franklin and Ivonova on Babylon 5

pgp00000.pgp
Description: PGP signature

Re: [SAtalk] Bayes problems and expiring

Reply via email to