The industry that I work in is currently having its concept of risk assessment
thoroughly shaken. The sort of risks we deal with have three main, largely
independant factors. For years we've been assigning a value to each of these
factors, and then adding them up to come up with a figure representative of
relative risk.

Then along came some bright spark who new a little bit about statistics. He
showed that we can estimate the risk of each of the three factors. Then he
pointed out that for someone to be injured, all three had to happen. And the
probability of a AND b AND c is the *product* of the three probabilities, not
the sum. It all makes sense. And frighteningly, gives quite different results
to the way we've used for years.

So I'm thinking about writing myself a policy server for Postfix. I want to
consider different things, weight them and use a combination of factors to
decide whether or not to reject mail. Much like SA does. Thinking about how to
weight things, I realised that the same principles could be applied to spam.
Perhaps.

For example (fictional figures here), say 95% of mail from clients in a
particular RBL is spam. We could say, then, that such an item of mail has a
probability of 0.05 of being ham. 80% of mail from clients giving a particular
form of HELO is spam - probability of 0.2 that it is ham. 60% of SPF fails are
spam - probability of 0.4 that such a mail is ham.

Thus if a piece of mail has failed all three of these tests, the probability of
it being ham is 0.05 * 0.2 * 0.4 = 0.004, or 1/250. Or put another way, we can
be 99.6% sure it is spam.

Now I'm neither a stastician nor an expert in fighting spam, so I'm sure there
are some flaws in this idea somewhere. One of them is probably that the various
tests available are not statistically independant. But as a basic principle, is
there mileage in this, or should I stick with addition, or find another way of
weighting stuff altogether?

I'm actually going to be away from my computer for the next ten days, so I
apologise if I don't promptly respond to your responsed, but rest assured I
will read them with great interest when I get back...

Thanks
-- 
Chris Hastie

Reply via email to