On Fri, 9 Oct 2009, Marc Perkel wrote:
What we need are rules that combine a lot of simple rules into concepts and
then combine those rules into rules that score - and score big. As an
example, lets take a standard nigerian scam email.
From <> reply to:
[I don't know you] Dear stranger, I am mr, ms. mrs. my name is
[I am connected] I am a soldier in Iraq, I and the daughter of an african
president, I work at a bank in hong hong
[I have money] I have the sum of 56 million dollars USD
[the money is hot] no beneficiaries, sneak it out of the country, oppressive
regime
[transfer to your account] splitting the funds, wire to your account
[i need you information] name, address, account number
[i want you to contact me] by email, phone
[keep this a secret] confidential discretion
So - we create a lot of simple rules with no points with key words and phases
and then combine these rules using meta rules to get these concepts. That way
we have a meta rule like, "they don't know me" "that are talking about
transferring millions" "they want my information" "they are talking about hot
money". Then you combine those concepts into rules that can definitively
determine it is spam.
And - I am still looking for someone who might do baysian or some other
automatic system that looks for rule combinations and increases scores based
on that.
That's exactly what I'm doing right now with the ADVANCE_FEE rules (which
I did _not_ originate - I'm only freshening them). The structure for
automatic meta generation is there.
The effort is in generating the subrules and deciding which ones are
generally related to each other.
The former can't really be automated, but Justin's giving it a shot with
SOUGHT. It is a lot of work to get broadly good results even for basic
rules.
The latter would be a good research project; it could trivially be done
right now based on the existing evolver if you simply fed it _all_ of the
existing rules to use as its base, and (for example) kept every evolved
rule set whose fitness was > 100000 (or whatever turns up as a good cutoff
point). Culling overlap would be an interesting exercise.
It's an interesting idea, but right now I don't quite have the hardware to
try doing it. Anybody care to order a refurbished 4-core Phenom off
TigerDirect for me? :) ( <- not serious )
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Judicial Activism (n): interpreting the Constitution to grant the
government powers that are popularly felt to be "needed" but that
are not explicitly provided for therein (common definition);
interpreting the Constitution as it is written (Brady definition)
-----------------------------------------------------------------------
8 days since a sunspot last seen - EPA blames CO2 emissions