Re: A different approach to scoring spamassassin hits

Marc Perkel Sat, 30 Jun 2007 07:42:07 -0700


Tom Allison wrote:

On Jun 30, 2007, at 1:20 AM, Marc Perkel wrote:
Tom Allison wrote:
For some years now there has been a lot of effective spam filteringusing statistical approaches with variations on Bayesian theory,some of these are inverse Chi Square modifications to Niave Bayes oreven CRM114 and other "languages" have been developed to improve thescoring of statistical analysis of spam. For all statisticalprocesses the spamicity is always between 0 and 1.
<snip>
Many Thanks for those of you who have read this far for yourpatience and consideration.
Tom, I suggested something somilar to that years ago and I'd stilllike to see it tried out. I wonder what would happen if you strippedot the body and ran bayes just on the headers and the rules and letbayes figure it out. You do have to have some points to start with toget bayes pointed in the right direction. But you could use blacklists and white lists to do bayes training. Also needs more rules toidentify ham and not just rules to identify spam.
I was under the belief that there were Ham-centric tests that wouldresult in negative point scorings.
Ham doesn't try to be evasive. It's pretty easy to identify. WithoutSA tagging much of it falls to <<0.5 and whitelisting would capturemuch of the exceptions.
As for headers only testing -- The first five lines of stock spam isvery telling...
My question about SA is the PerMsgStatus (I think) Is this the placeto retrieve all the rules information? I know today you can get alist of all the rules that HIT, but is there where you would look tofind all the rules that were attempted? Or is there a better placefor it?


There are some ham tests in SA but not nearly enough.

Re: A different approach to scoring spamassassin hits

Reply via email to