As I debugged the code a bit more, I came to the conclusion that
the power-of-two approach just had trouble coping with by database.
Perhaps it had something to do with my recent upgrade from 2.55
to 2.60, but it seemed like most of the tokens had the same atime,
because expiring entries older than 16 days would only expire 
a few thousand, while expiring over 32 days would have expired
almost all entries. So the doubling-approach just made the jump
from 16 to 32 days too big and it could not decide on either
time interval to expire on.

Incidentally, as time has passed now the 16 day limit just suddenly
started to make sense, so the bayes database actually got some
tokens expired today... ;)

As for the timeout problems, I found that the bayes journal is not
enabled by default. All my timeout problems vanished as I started
to use the journal, because the parallel SA processes were then
able to share the database better. 

--
[EMAIL PROTECTED]     GSM  +358-40-767 8282
Oy Arrak Software Ab   http://www.arrak.fi
 

> -----Original Message-----
> From: Theo Van Dinter [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, October 07, 2003 7:39 PM
> To: Kai Risku
> Cc: [EMAIL PROTECTED]
> Subject: Re: [SAtalk] Bayes problems and expiring
> 
> 
> On Thu, Oct 02, 2003 at 09:43:01AM +0300, Kai Risku wrote:
> > debug: bayes: expiry check keep size, 75% of max: 112500
> > debug: bayes: token count: 161438, final goal reduction size: 48938
> > debug: bayes: First pass?  Current: 1065075477, Last: 1065043573, 
> > atime: 0,
> > count: 0, newdelta: 0, ratio: 0
> > debug: bayes: something fishy, calculating atime (first pass)
> > debug: bayes: couldn't find a good delta atime, need more 
> token difference,
> > skipping expire.
> > 
> > It seems like the expiry code is having some kinds of 
> problems here. 
> > The database has been slowly accumulated from a live feed of emails 
> > for a duration of several weeks. Here is the beginning of "sa-learn 
> > --dump" :
> 
> Yes.  Expiry is a best effort attempt, not a guarantee of max 
> size. I wrote in lots of detail into the sa-learn docs.  In 
> this case, look at the "ESTIMATION PASS LOGIC" section.
> 
> In short, the expiry system can't find an appropriate time 
> unit that can be used without either too many tokens being 
> expired, or less than 1000 tokens being expired.  So it wants 
> more token atime differences so the next time it tries to 
> expire it may have a shot at something between 1000 and "too many".
> 
> > Can anybody shed some light on whether this is correct behaviour or 
> > not? My main problem is slowness when using the Bayes checks with 
> > timeouts occurring, and it might have to do with too large 
> a database. 
> > Any other explanations are also welcome!
> 
> # of tokens shouldn't have an effect on speed.  At least, I 
> haven't seen that happen.  I have ~400k tokens # in my DB at 
> any point, and it runs just as speedy as when 100k tokens 
> were in there.
> 
> -- 
> Randomly Generated Tagline:
> "You can start by removing your clothes.
>   Not without flowers and dinner."       - Franklin and 
> Ivonova on Babylon 5
> 


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to