Tom Allison wrote:
On Jun 30, 2007, at 1:20 AM, Marc Perkel wrote:
Tom Allison wrote:
For some years now there has been a lot of effective spam filtering
using statistical approaches with variations on Bayesian theory,
some of these are inverse Chi Square modifications to Niave Bayes or
even CRM114 and other "languages" have been developed to improve the
scoring of statistical analysis of spam. For all statistical
processes the spamicity is always between 0 and 1.
<snip>
Many Thanks for those of you who have read this far for your
patience and consideration.
Tom, I suggested something somilar to that years ago and I'd still
like to see it tried out. I wonder what would happen if you stripped
ot the body and ran bayes just on the headers and the rules and let
bayes figure it out. You do have to have some points to start with to
get bayes pointed in the right direction. But you could use black
lists and white lists to do bayes training. Also needs more rules to
identify ham and not just rules to identify spam.
I was under the belief that there were Ham-centric tests that would
result in negative point scorings.
Ham doesn't try to be evasive. It's pretty easy to identify. Without
SA tagging much of it falls to <<0.5 and whitelisting would capture
much of the exceptions.
As for headers only testing -- The first five lines of stock spam is
very telling...
My question about SA is the PerMsgStatus (I think) Is this the place
to retrieve all the rules information? I know today you can get a
list of all the rules that HIT, but is there where you would look to
find all the rules that were attempted? Or is there a better place
for it?
There are some ham tests in SA but not nearly enough.