Bob Proulx wrote:
> Having a false positive every now and again is nothing new and I would
> train on error when that would occur and correct the issue.  The
> problem I am seeing now is that when I train the message the Bayes
> engine does not learn the message as ham and still classifies the
> message as 99% likely to be spam.

Is there any way to reduce the expiry time of tokens?

> [15528] dbg: bayes: corpus size: nspam = 95535, nham = 38741

I am handling a reasonably large number of messages.  I think this is
causing there to be an imbalance between spam and ham.  There is too
much spam and it is overwhelming the statistics against the non-spam.
I think if the expiry time were reduced then this would place a cap on
the amount that spam can overwhelm the non-spam messages and bring
them back into balance.

I read through the expiration section of the sa-learn man page.
Unfortunately I did not comprehend enough of it.  I suspect that what
I want to do is not really exposed at this moment.

Any hints on how to keep the bayes tuned up to avoid false positives?

Thanks
Bob

Reply via email to