On Fri, 9 Nov 2018 15:34:47 -0500 Kris Deugau wrote: > Amir Caspi wrote: > > On Nov 9, 2018, at 8:10 AM, Matus UHLAR - fantomas > > <uh...@fantomas.sk> wrote: > >> > >> how many spams and hams did you train then? > > > > As of right now: > > 0.000 0 258427 0 non-token data: nspam > > 0.000 0 106813 0 non-token data: nham > > 0.000 0 438310 0 non-token data: ntokens > > > >> I have increased to this number, on some servers even to double of > >> that number. > > > > I increased to your recommendation, so per above, am now storing > > more tokens... hopefully this helps. > > My target for tweaking bayes_expiry_max_db_size at work has been to > try to hit no more than 5-10% daily churn in tokens; IIRC I've asked > once or twice but nobody else has spoken up with any of their own > rules of thumb. Right now it's probably a bit high at 2450000 (given > that every so often, there are a couple of days with no tokens > expired), but the default of 250K was far too low.
The default is actually 150,000. IIRC his retention was 64 days which isn't too bad. I'd take it up to 300,000 and see how it goes. The standard expiry algorithm isn't designed to handle very long retention and it may stop working altogether. IIRC the retention at the target size of 0.75*bayes_expiry_max_db_size should be less than 256 days.