Bob Proulx wrote: > Having a false positive every now and again is nothing new and I would > train on error when that would occur and correct the issue. The > problem I am seeing now is that when I train the message the Bayes > engine does not learn the message as ham and still classifies the > message as 99% likely to be spam.
Is there any way to reduce the expiry time of tokens? > [15528] dbg: bayes: corpus size: nspam = 95535, nham = 38741 I am handling a reasonably large number of messages. I think this is causing there to be an imbalance between spam and ham. There is too much spam and it is overwhelming the statistics against the non-spam. I think if the expiry time were reduced then this would place a cap on the amount that spam can overwhelm the non-spam messages and bring them back into balance. I read through the expiration section of the sa-learn man page. Unfortunately I did not comprehend enough of it. I suspect that what I want to do is not really exposed at this moment. Any hints on how to keep the bayes tuned up to avoid false positives? Thanks Bob