Ted Mittelstaedt wrote:
Marc Perkel wrote:
I've brought this idea up over the years but I'll try to explain it
in a different way. Maybe we can do this with a lot of meta rules.
What we need are rules that combine a lot of simple rules into
concepts and then combine those rules into rules that score - and
score big. As an example, lets take a standard nigerian scam email.
From <> reply to:
[I don't know you] Dear stranger, I am mr, ms. mrs. my name is
[I am connected] I am a soldier in Iraq, I and the daughter of an
african president, I work at a bank in hong hong
[I have money] I have the sum of 56 million dollars USD
[the money is hot] no beneficiaries, sneak it out of the country,
oppressive regime
[transfer to your account] splitting the funds, wire to your account
[i need you information] name, address, account number
[i want you to contact me] by email, phone
[keep this a secret] confidential discretion
So - we create a lot of simple rules with no points with key words
and phases and then combine these rules using meta rules to get these
concepts. That way we have a meta rule like, "they don't know me"
"that are talking about transferring millions" "they want my
information" "they are talking about hot money". Then you combine
those concepts into rules that can definitively determine it is spam.
And - I am still looking for someone who might do baysian or some
other automatic system that looks for rule combinations and increases
scores based on that.
I know that it seems like the idea of building up "meta" rules with
a lot of small rules will give you a more accurate hit rate, but
this is one of those non-intuitive things that can be shown by
statistical mathmatics, that is that the concept won't work. Or
rather, it won't work any better than the existing paradigm.
In other words, the current system of assigning little points to
a lot of little rules will yield the same result for any given
set of spam messages as organizing all
these small rules into groups that have bigger point values.
The only thing the organization does is for humans to understand
what is going on better. This is because how humans think about
math like statistics is a lot different than how a computer
works with mathematics like statistics.
Ted
I think you are missing my point. Here's an example.
Mentions God/Christianity = 0
Mentions Nigeria = 0
Mentions Bank = 0
Mentions Funds = 0
Mentions all 4 = 100
This is simplistic but it makes my point.