Bart Schaefer writes: >The point is that -- aside from the rule "do not teach spam as ham, nor >teach ham as spam" -- YOU DON'T REALLY KNOW what data will increase or >decrease the classifier's accuracy. As a human, you're good at making the >gestalt (and subjective) judgement "this is spam" (or ham). You're not >good at instantly recognizing every fragment of the message that the >classifier considers to be a token and then determining whether each such >token occurs more frequently (or uniquely) in spam or ham.
Ah, that's the good point right there about Bayes. The reason Bayesish probabilistic classification systems work well, is *because* you don't have to second-guess them -- just feed them spam and ham, and they will "do the right thing", even if you think you might be confusing them. --j. ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk