On Fri, 9 Oct 2009, Marc Perkel wrote:

What we need are rules that combine a lot of simple rules into concepts and then combine those rules into rules that score - and score big. As an example, lets take a standard nigerian scam email.

From <> reply to:

[I don't know you] Dear stranger, I am mr, ms. mrs. my name is

[I am connected] I am a soldier in Iraq, I and the daughter of an african president, I work at a bank in hong hong

[I have money] I have the sum of 56 million dollars USD

[the money is hot] no beneficiaries, sneak it out of the country, oppressive regime

[transfer to your account] splitting the funds, wire to your account

[i need you information] name, address, account number

[i want you to contact me] by email, phone

[keep this a secret] confidential discretion

So - we create a lot of simple rules with no points with key words and phases and then combine these rules using meta rules to get these concepts. That way we have a meta rule like, "they don't know me" "that are talking about transferring millions" "they want my information" "they are talking about hot money". Then you combine those concepts into rules that can definitively determine it is spam.

And - I am still looking for someone who might do baysian or some other automatic system that looks for rule combinations and increases scores based on that.

That's exactly what I'm doing right now with the ADVANCE_FEE rules (which I did _not_ originate - I'm only freshening them). The structure for automatic meta generation is there.

The effort is in generating the subrules and deciding which ones are generally related to each other.

The former can't really be automated, but Justin's giving it a shot with SOUGHT. It is a lot of work to get broadly good results even for basic rules.

The latter would be a good research project; it could trivially be done right now based on the existing evolver if you simply fed it _all_ of the existing rules to use as its base, and (for example) kept every evolved rule set whose fitness was > 100000 (or whatever turns up as a good cutoff point). Culling overlap would be an interesting exercise.

It's an interesting idea, but right now I don't quite have the hardware to try doing it. Anybody care to order a refurbished 4-core Phenom off TigerDirect for me? :) ( <- not serious )

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Judicial Activism (n): interpreting the Constitution to grant the
  government powers that are popularly felt to be "needed" but that
  are not explicitly provided for therein (common definition);
  interpreting the Constitution as it is written (Brady definition)
-----------------------------------------------------------------------
 8 days since a sunspot last seen - EPA blames CO2 emissions

Reply via email to