On Sat, 12 Dec 2015 13:29:40 +0100
Axb wrote:

> On 12/12/2015 01:08 PM, Reindl Harald wrote:
  
> >> I hate stale data... that's all  

But you do keep stale data in the retained tokens, what you are getting
rid of is the contribution from old mails that's least likely to make a
difference to any classifications.  Expiry is about managing database
size; if it were about expiring stale information it would be
implemented differently.

> > practical reasons?
> > it's a computer  
> performance... If I keep accessing X years of stale data my scanning 
> times go to the roof.

The time taken to look-up n tokens from a database containing m tokens
shouldn't strongly depend on m. There's something wrong if it does. 

> > financial reasons?
> > if you mean performance  
> 
> no... money.. If I see 15 million msgs/day and keep the Bayes data
> which those millions provided over a decade or more, I'd be in the TB
> amount of data... I couldn't really justify requesting servers with
> TBs RAM. Accounting would put me in the looney house.

The number of tokens depends on how many you train, not on how many you
scan. 


Reply via email to