On Thu, Oct 02, 2003 at 09:43:01AM +0300, Kai Risku wrote: > debug: bayes: expiry check keep size, 75% of max: 112500 > debug: bayes: token count: 161438, final goal reduction size: 48938 > debug: bayes: First pass? Current: 1065075477, Last: 1065043573, atime: 0, > count: 0, newdelta: 0, ratio: 0 > debug: bayes: something fishy, calculating atime (first pass) > debug: bayes: couldn't find a good delta atime, need more token difference, > skipping expire. > > It seems like the expiry code is having some kinds of problems here. The > database has been slowly accumulated from a live feed of emails for a > duration of several weeks. Here is the beginning of "sa-learn --dump" :
Yes. Expiry is a best effort attempt, not a guarantee of max size. I wrote in lots of detail into the sa-learn docs. In this case, look at the "ESTIMATION PASS LOGIC" section. In short, the expiry system can't find an appropriate time unit that can be used without either too many tokens being expired, or less than 1000 tokens being expired. So it wants more token atime differences so the next time it tries to expire it may have a shot at something between 1000 and "too many". > Can anybody shed some light on whether this is correct behaviour or not? My > main problem is slowness when using the Bayes checks with timeouts > occurring, and it might have to do with too large a database. Any other > explanations are also welcome! # of tokens shouldn't have an effect on speed. At least, I haven't seen that happen. I have ~400k tokens # in my DB at any point, and it runs just as speedy as when 100k tokens were in there. -- Randomly Generated Tagline: "You can start by removing your clothes. Not without flowers and dinner." - Franklin and Ivonova on Babylon 5
pgp00000.pgp
Description: PGP signature