On Fri, 16 May 2014 21:36:22 -0600 Bob Proulx wrote: > David Jones wrote: > > > James B. Byrne wrote: > > > If you keep Bayes well trained (assuming you have enough ham to > > > do so) Bayes poisoning is a myth. > > > > I'm not sure I agree with the "myth" statement. I just had to > > reset my Bayes DB after years of it slowly drifting due to bad user > > input and such.
That's mistraining. So-called "Bayes poisoning" is aimed at affecting the classification. > > Years? How far back does your Bayes db store data? .. > My Bayes db only has the last month's of data in it. That is a > completely stock configuration. I think the storage is actually by > number of tokens not age though. It would be great if someone could > explain that in better detail. It is managed by number. Each token has an access time which records when it last contributed to a classification or appeared in a learned email, the least recently seen tokens get purged. What gets purged is a mixture of obsolete (often ephemeral) tokens and the long tail of infrequently seen tokens. Having a month of retention doesn't mean that you only have a month of data, because the most important tokens never get purged and so contain information that can go back years. IIWY I'd increase the number of tokens, 150,000 is pretty small. Some tokens are characteristic of senders or the mail servers they use, and retaining those signature tokens helps to identify ham, and avoid FPs. I wouldn't like to keep the retention below, or close to, a month because some ham is sent monthly.