Re: New Bayes like paradigm

Marc Perkel Tue, 27 Sep 2011 20:13:39 -0700


On 9/25/2011 5:37 PM, RW wrote:

On Sun, 25 Sep 2011 09:28:32 -0700
Marc Perkel wrote:

Here's what I'd like to be able to do. I'd like a program of some
sort where I could take word tokes - like name of rules that were
triggered - and look for rule combinations that indicate spam or ham.
For example, a message triggers 4 rules A B C and D. These rules are
combined as follows:

A
...
ABCD

Each rule combo is then looked up for how often it occurs in spam and
how often it occurs in ham. Then the results are combined into some
sort of likelihood of being spam or ham.

There are a couple of problems with this. The first is that most SA
rules are either neutral or strong spam indicators, which make them
unsuitable for the sort of techniques used in Bayes.

The second is that most of the scope for meaningful combinations is in
high-scoring spam. Low-scoring spams are low-scoring because SA couldn't
find much evidence - in these you're going to end-up with
meaningless strong+neutral combinations like BAYES_99+SPF_PASS.

That's not to say that it can't be done in a more general sense; the
scoring system is a way  of converting rule combinations into a
classification.

Similar questions have been asked before, IIRC someone came-up with
an alternative way of getting a classification from the rule hits
based on learning, and made a basic plugin that tweaked the score
accordingly.

Here's the kind of think I'm seeing. Spam talks about money - low score.Spam talks about Jesus - low score. Spam talks about money and Jesus andthrow in a dear someone and it's spam. I'm hoping to detect combinationsautomatcally.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: New Bayes like paradigm

Reply via email to