Has anybody considered revising the Bayes expiration logic? Maybe it's just our data that's weird, but the built-in expiration logic doesn't seem to work very well for us. Here are my observations:
There's no point in checking anything older than oldest_atime. For this value and older, zero tokens will be expired. The current estimation pass logic goes back 256 days, even if the oldest atime is one week and the calculations have already started returning zeroes. If your target corresponds to a delta of more than a few days, you're unlikely to get very close to it because the estimation pass logic uses exponentially increasing intervals. There could be a big difference between 8 days and 16 days for delta. The initial "guesstimate" algorithm can choose a delta that's older than the oldest atime, which will result in the dreaded "expired 0E0 tokens". Conversely, it can choose a delta so new that far too many tokens are expired. You're guaranteed to have at least 100,000 tokens left, but that's not good enough if you have set the max DB size to a million or more. I suggest using a binary search or perhaps linear interpolation. The starting endpoints would be 1) Oldest atime. We already know it will expire zero tokens. 2) 12 hours ago. Calculate the number of tokens expired for this value. If it expires too few, then use this as your delta (or quit if it expires less than 1000). If it expires too many, you have your two endpoints to begin the search. You can decide when to quit by closeness to the target, size of the interval, or number of iterations, or some combination. The only problem I've seen is that the token age distribution is nonlinear enough that there are some cases where linear interpolation doesn't converge very well, and I don't know the best way to introduce fudge factors to get around this.