Re: bayes training question

Loren Wilton Mon, 23 May 2005 04:52:19 -0700

> - I get some messages marked as SPAM coming form this mailing list,
> since the body contains URLs and text from real spam messages: do I have
> to feed them in my DB as ham or this can cause some kind of bayes
> poisoning ?


The best thing is to avoid having the mail from this list go through SA.
There are various ways to do this, depending on your mail setup.


> - I assume that the training is more important for the messages marked
> with BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score:
> 0.5998]; is this correct ?

Probably most important are cases where Bayes guessed wrong, rather than
simply not being real sure.  Always train as ham or spam anything you see
that Bayes decided to lean the other way.  This way it will get to know what
is what for you.

Second most important would be training stuff that scores close to 50%.
Personally I tend to dump most spam that scores less than about 80% into the
spam training bucket.  Now and then I'll throw a handful of known ham in the
ham bucket, to try to keep the number of learned ham/spam somewhat balaced.


> - Shall I train as ham also the messages not marked as SPAM but having a
> score close between 1/2 and 3/4 ? I mean, feeding also "normal" messages
> into the system helps to have a good bayes filtering ?

I'm not absolutely sure what you are saying here.  If you are asking if you
should train known ham as ham, the answer is yes.  Bayes needs to be able to
decide which tokens are ham and which are spam.  It can only do this if it
sees both ham and spam.  If you have ham that is hitting more than 20 or 30%
you should certainly train it as ham.  However, even throwing ham that
scores near 0 into training every so often is a good idea.

        Loren

Re: bayes training question

Reply via email to