NOTE: I read the Corvigo whitepaper, but I don't know ANYTHING about their product. This statement is based entirely upon what I have assumed after reading this single document.
> Essentially AI interpretation of the meaning (or intent as they put it) > of language in order to identify spam. > Correct me if I'm wrong (and I'm sure I have at least over-simplified). The key to this seems to be in the statement "Make m0-ney fast from home!", where Corvigo ignores "m0-ney" because it is "unknown". They then key off of "Make ... fast from home!". Their claim is that Corvigo's offering is better than Bayesian b/c it ignores unknowns, instead of classifying unknowns and then scoring based upon the value of possibly-improperly-weighted unknowns (like m0-ney). But doesn't the naive Bayesian algo take care of this inherently? Doesn't the number of extrema reduce the likelihood that a new word or bizarre word is at all considered when scoring the e-mail as a whole?(see -k in 'man bmf' for my definition of extrema) Once I hit that extrema sweet-spot (somewhere between 10 and 20 tokens per message, based upon what is ignored), I am not looking at anything but the "meat" of the message. Bayes' algo shows us that if a message contains ~20 distinct "spam" tokens, the message has a very high (99%) likelihood of being actual spam. This can be improved (I think this is one of the ways that CRM-114 works?) by tokenizing individual words and their neighbors (Make money) and (money fast). Again, Bayes can do what Corvigo claims "contextually". On the surface, it seems to me that Corvigo is offering a mild improvement to Bayes, which could be easily incorporated into current Bayesian offerings. Consider: Train your filter with a significant, personal corpus of hand-sorted spam and ham. Then, be able to "turn off" the learning feature of your Bayesian filter. This would essentially give you what Corvigo has (at least, per their simple example). If my Bayes doesn't know "m0-ney" then it keys off of "Make fast from home!". > Is this approach being pursued in open source space? I think we have it, with minor tweaks to existing code (if even necessary at all). ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk