On Wed, 24 Sep 2003, Matt Kettler wrote:

> The message must be _exactly_ the same as it originally was, headers and 
> all. Even very subtle changes can cause the bayes engine to learn things 
> you might not expect. You want it to learn about ham and spam, not about 
> forwarded message formats.
> 
> [...] it doesn't learn the user.. rather it learns "any forwarded message 
> is spam". "any message with message headers similar to the ones generated 
> by this users mail client is spam". You get the picture.

That'd all be true if the users forwarded only spam for learning.  If they
forwarded both ham and spam, it'd learn to ignore the characteristics of
local clients and forwarded messages.  Right?

It's true that the accuracy of tests involving header tokens will be much
reduced if the format learned is not the same as the format tested.  But
if learning really completely fails in these circumstances, it would be
nearly impossible to train, e.g., SAproxy, with messages that have already
passed through Windows mail clients on their way to the disk.  And yet
people DO train SAproxy that way, and it seems to work.



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to