Has anybody considered revising the Bayes expiration logic?  Maybe it's just 
our data that's weird, but the built-in expiration logic doesn't seem to work 
very well for us.  Here are my observations:

There's no point in checking anything older than oldest_atime.  For this value 
and older, zero tokens will be expired.  The current estimation pass logic goes 
back 256 days, even if the oldest atime is one week and the calculations have 
already started returning zeroes.

If your target corresponds to a delta of more than a few days, you're unlikely 
to get very close to it because the estimation pass logic uses exponentially 
increasing intervals.  There could be a big difference between 8 days and 16 
days for delta.

The initial "guesstimate" algorithm can choose a delta that's older than the 
oldest atime, which will result in the dreaded "expired 0E0 tokens".  
Conversely, it can choose a delta so new that far too many tokens are expired.  
You're guaranteed to have at least 100,000 tokens left, but that's not good 
enough if you have set the max DB size to a million or more.

I suggest using a binary search or perhaps linear interpolation.  The starting 
endpoints would be

1) Oldest atime.  We already know it will expire zero tokens.

2) 12 hours ago.  Calculate the number of tokens expired for this value.  If it 
expires too few, then use this as your delta (or quit if it expires less than 
1000).  If it expires too many, you have your two endpoints to begin the 
search.  You can decide when to quit by closeness to the target, size of the 
interval, or number of iterations, or some combination.

The only problem I've seen is that the token age distribution is nonlinear 
enough that there are some cases where linear interpolation doesn't converge 
very well, and I don't know the best way to introduce fudge factors to get 
around this.

Reply via email to