Matt Kettler wrote: > > Yes, bayes poison should be trained without worry. However, bayes poison is > not > the topic of discussion here. We are talking about mis-learning, something > COMPLETELY different.
Kai Schaetzl talked about "prevent[ing] you from accidently poisoning your Bayes db", so I assumed we were talking about bayes poisoning. > Mis-learning a ham message as spam is always bad, and can have a minor or > severe > impact depending on the circumstances. There is no question of that > mis-learning > should be avoided whenever possible. I agree. > Learning bayes poison as spam isn't a matter of "oh, it doesn't matter because > it's in the random noise" it's a matter of accurate training. You WANT SA to > learn about common tokens that are used by both categories. This is important > to > SA's accuracy, as it's a fact of reality. I agree. > Mis-learning is not random noise, it doesn't reflect reality, and it is not > the > same thing as bayes poison. Not at ALL the same. It's just bad. > >>>In conclusion, I feel confident in letting SA learn from every message >>>that I am certain that it can be certain is spam. > > Are you sure your conclusions are based on accurate perceptions of the > consequences? > I am sure that there will be no mislearning, even if I lower the body and/or header limits a bit, and that any mislearning that nevertheless may occur can be rectified by relearning. The mail volumes are low What I still would like to know is the theory behind the hardcoded 3 point limits. Can someone give as an example a message that would be mislearnt if it weren't for those limits? -- Magnus Holmgren [EMAIL PROTECTED]