How does the Bayes training work, anyway..
In short:
First, you need to understand bayes is based on breaking email down into "tokens". For simplicity, you can just consider each word of an email to be a token. SA uses other tokens (header fragments, etc), but it does use words as tokens as well, and they are the easiest to think about.
Bayes training works based on breaking the email up into tokens and keeping track of the number of times it's been seen in spam and nonspam mail. From the number of times it's been seen in spam and nonspam, a "probability of spam" for the token can be calculated.
Bayes scoring works by checking all the tokens present in the email against the database and generating an aggregate probability of spam by more-or-less averaging them all together.
Technicaly the exact details a bit more complex than mentioned above, However, all the exact details aren't too important with respect to getting a general understanding of it all. There's a lot of boring details involving statistical methods, string parsing, token selection, etc, but it's largely irrelevant here.
. If this one message gets trained as --spam, how much of an effect does that have next time around?
The amount of impact of training one message as spam varies significantly depending on what your other training looks like.
If most of the tokens in the email have been seen thousands of times in nonspam, and only a few times in spam, the training will have little or no impact. the difference between 1 in 2000 and 2 in 2000 isn't that significant.. it still amounts to more or less 0 probability of spam.
On the other hand if they're mostly tokens that have never been seen before at all, the impact can be huge. mis-spelled words are VERY likely to be in this category.
------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk