Justin Mason wrote on Fri, 08 Oct 2004 09:39:15 -0700: > So you wind up with a very big, but unexpirable, db?
yes. I can expire it with the trick mentioned, but then it blows most of the db. And the next expire fails again until I play other tricks or wait long enough. F.i. I can dump the stuff, change the atime delta to a value which would make sa-learn expire a few tokens and then import the whole dump. I did this once in the past when I had a database which contained negative values and values in the future. I stripped all those wrong tokens, but the expire would still not work because it uses the last expire atime delta as a parameter in the expire calculation. So, I had to calculate a delta which would stop it thinking "fishy" (= produce a good ratio, if I remember it right) and replace it in the dump and then imported all of that again. Until a better method is found it would help immensely if we could provide sa-learn with "faked" values, so that it doesn't go in the "fishy" interations at all. Or just to be able to give it an expire atime delta to use instead it trying to calculate one itself. I think > that would be worth a bug, yes. > > in my opinion, expiry should always do *something* to get the db > below a target size, even if that *something* isn't strictly token > removal by atime. Yes, at least I should be able to rely on the expiry. F.i. the problem now is that with auto-expiry it suddenly hits the threshold and tries to expire with any new pass - and can't. This is like a DoS. When I shut off auto-expiry to avoid it it doesn't do any good other than avoiding the processing delay. I still can't expire. Proposal: I think a good way would be to use a removal percentage plus a *minimum*, configurable in local.cf. The db needs then to be sorted before expiry, which we don't do now I think. This takes a bit but is much more reliable. auto_expiry could still use the current method *when it works*, but stop doing any iterations when it detects something "fishy". This would stop those massive time-outs. Then, a forced expire would use the percentage method *only*. This way I could switch off auto-expire and run a (f.i.) 1% expire each night until it reaches a minimum. If the bayes db is at the minimum it won't expire at all. So, a given minimum would avoid the minimal chance of slashing my db to almost zero if it grows too slow. Once the db is sorted you can start removing at the beginning of the file and stop when you reach the reduction goal. f.i. db of 1.000.000, reduction percentage 1%, minimum 500.000 => reduction goal = 10.000. Of course, if we stop right at the 10.000th removal we are likely to keep some tokes of the same atime we just removed. But, does this really matter? We could also use the last atime we removed when we were about to stop and expire all remaining tokes of that atime. So, instead of removing exactly 10.000 tokens we may remove 10.312 tokens. Again, I think it doesn't matter. Does this proposal sound reasonable? I could then file a bug outlining it. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com IE-Center: http://ie5.de & http://msie.winware.org