Hi, >> Well, what's the missing 120 MB? The journal? Do a complete sync and >> then delete it. > > Probably the signatures in bayes_seen - there's no mechanism for ageing > them out.
And I assume that isn't a problem then? >> "too big" is not an absolute figure. If you store 1-occurence tokens >> you will obviously have more tokens than without them. > > There's not really a choice since all tokens start that way. Maybe a better estimate would be in terms of time. For how long should the unseen tokens (only occurred once, I guess) remain in the database? Perhaps that's a good metric. For me it's about a week now. >> You should use autolearn if you don't do yet. > > Autolearning can make things worse by dropping the retention period. Yes, I'm using autolearn, but how does that affect the retention period? What do the two have to do with each other? Do you mean auto-expire, not auto-learn? My database seems to have improved slightly over the past few days after increasing the max db size to 1.6M. I guess there is also a lot of expiry pending also, because the database is currently much larger than that today: 0.000 0 2050481 0 non-token data: ntokens Looks like about 345k to be purged, if I understand correctly? Thanks, Alex Thanks, Alex