Rob, > Because bayes_seen was also quite big I read up on that too. > Since the table doesn't include any age information and (most) > everything I found says "just delete it", I emptied the table. > Although I think it's strange to just throw away information about > previous seen messages that have been classified as either spam or > ham. Any other insight in this would be valued..
No need to bother with bayes_seen, just purge it every once in a while when it grows large. > > Some people include atime information for that purpose. > > Yes, thanks.. I ran into a post that mentioned that some time after I > posted, and added such field which will indeed do what I want. (It isn't > going to help with the imported data though, because that info is not > available in the original bdb files.) The main purpose of bayes_seen is to prevent a stream of same-contents messages arriving in a short succession from polluting a bayes database. It is unlikely that a same contents message arrives more than once during a long interval, and even if it does, there's not much harm done even if re-learnt. I believe the bayes_seen had its purpose when mail viruses were frequent and spam messages were arriving in non-personalized batches. These times have long since gone. Mark