Hi,

>> Well, what's the missing 120 MB? The journal? Do a complete sync and
>> then delete it.
>
> Probably the signatures in bayes_seen - there's no mechanism for ageing
> them out.

And I assume that isn't a problem then?

>> "too big" is not an absolute figure. If you store 1-occurence tokens
>> you will obviously have more tokens than without them.
>
> There's not really a choice since all tokens start that way.

Maybe a better estimate would be in terms of time. For how long should
the unseen tokens (only occurred once, I guess) remain in the
database? Perhaps that's a good metric. For me it's about a week now.

>> You should use autolearn if you don't do yet.
>
> Autolearning can make things worse by dropping the retention period.

Yes, I'm using autolearn, but how does that affect the retention
period? What do the two have to do with each other? Do you mean
auto-expire, not auto-learn?

My database seems to have improved slightly over the past few days
after increasing the max db size to 1.6M. I guess there is also a lot
of expiry pending also, because the database is currently much larger
than that today:

0.000          0    2050481          0  non-token data: ntokens

Looks like about 345k to be purged, if I understand correctly?

Thanks,
Alex





Thanks,
Alex

Reply via email to