> - I get some messages marked as SPAM coming form this mailing list, > since the body contains URLs and text from real spam messages: do I have > to feed them in my DB as ham or this can cause some kind of bayes > poisoning ?
The best thing is to avoid having the mail from this list go through SA. There are various ways to do this, depending on your mail setup. > - I assume that the training is more important for the messages marked > with BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: > 0.5998]; is this correct ? Probably most important are cases where Bayes guessed wrong, rather than simply not being real sure. Always train as ham or spam anything you see that Bayes decided to lean the other way. This way it will get to know what is what for you. Second most important would be training stuff that scores close to 50%. Personally I tend to dump most spam that scores less than about 80% into the spam training bucket. Now and then I'll throw a handful of known ham in the ham bucket, to try to keep the number of learned ham/spam somewhat balaced. > - Shall I train as ham also the messages not marked as SPAM but having a > score close between 1/2 and 3/4 ? I mean, feeding also "normal" messages > into the system helps to have a good bayes filtering ? I'm not absolutely sure what you are saying here. If you are asking if you should train known ham as ham, the answer is yes. Bayes needs to be able to decide which tokens are ham and which are spam. It can only do this if it sees both ham and spam. If you have ham that is hitting more than 20 or 30% you should certainly train it as ham. However, even throwing ham that scores near 0 into training every so often is a good idea. Loren