Re: A New Approach: Find the Ham

Dan Sat, 10 Feb 2007 12:38:16 -0800

On Feb 10, 2007, at 12:14, Miles Fidelman wrote:

Dan wrote:
I've developed a new approach to scoring that I want to 1) sharewith everyone and 2) make into a working system thats as accurateas what I've already built, but easier to use. First, the theory:
NEW ASSUMPTION
All messages are spam unless x,y,z score says they're ham.

NEW APPROACH
Block everything, then create rules to not catch what you dowant. ie, build tests that target the spam (keeping all the testsyou've already built), then score the thousands of ways hamtriggers on those tests.
It strikes me that the hardest part of this approach is filteringout too much ham. At least for me, it's more important to makesure that people reach me, than to filter out all spam. If we takethe approach that everything is to be filtered out, except x,y,z -then the risk of filtering out too much seems pretty high.

Actually, [unparalleled] accuracy is built into this approach.Currently, a ham gets caught and you either take out the rule thatcaught it or make a whitelist entry.


        Lots of ongoing work = little cumulative return

With Find the Ham, whitelisting is almost obsolete. When you find anFP, you make an exception for the specific profile, the permutationof which tests/rules caught the message so this specific assemblydoesn't catch any more. The rules stays at full strength for everyother permutation and no whitelist is needed.

This training process is the best part of the whole approach. Itbegins with huge FPs, but significant improvements take only a fewweeks. A few months (depending on the diversity of your ham) and FPsare very very rare.


        Little ongoing work = huge cumulative return


Dan

Re: A New Approach: Find the Ham

Reply via email to