Re: Forcing autolearn

Magnus Holmgren Thu, 04 Aug 2005 02:47:57 -0700

Matt Kettler wrote:
> 
> Yes, bayes poison should be trained without worry. However, bayes poison is 
> not
> the topic of discussion here. We are talking about mis-learning, something
> COMPLETELY different.


Kai Schaetzl talked about "prevent[ing] you from accidently poisoning
your Bayes db", so I assumed we were talking about bayes poisoning.

> Mis-learning a ham message as spam is always bad, and can have a minor or 
> severe
> impact depending on the circumstances. There is no question of that 
> mis-learning
> should be avoided whenever possible.

I agree.

> Learning bayes poison as spam isn't a matter of "oh, it doesn't matter because
> it's in the random noise" it's a matter of accurate training. You WANT SA to
> learn about common tokens that are used by both categories. This is important 
> to
> SA's accuracy, as it's a fact of reality.

I agree.

> Mis-learning is not random noise, it doesn't reflect reality, and it is not 
> the
> same thing as bayes poison. Not at ALL the same. It's just bad.
>
>>>In conclusion, I feel confident in letting SA learn from every message
>>>that I am certain that it can be certain is spam.
> 
> Are you sure your conclusions are based on accurate perceptions of the 
> consequences?
> 
I am sure that there will be no mislearning, even if I lower the body
and/or header limits a bit, and that any mislearning that nevertheless
may occur can be rectified by relearning. The mail volumes are low

What I still would like to know is the theory behind the hardcoded 3
point limits. Can someone give as an example a message that would be
mislearnt if it weren't for those limits?

-- 
Magnus Holmgren
[EMAIL PROTECTED]

Re: Forcing autolearn

Reply via email to