My new method for blocking spam - REVEALED!

Marc Perkel Wed, 20 Jan 2016 08:53:09 -0800

OK - following up on this. I have my provisional patent filed. I'm stilldoing development to improve it and working on a licensing contract. Butthe license will be based on the Creative Commons patent with somerestrictions added. Basically I want to get a license fee from the bigguys and my spam filtering competitors. So unless you are in the spamfiltering business or have more than 10,000 email addresses it's notgoing to cost you anything.

I'm going to describe the concept here. I'm not going to share my codebecause my code is specific to my system and it a combination of bashscripts, redis, pascal, php, and Exim rules. And the open sourceprogrammers are likely to implement it better than I have. Basically I'mtrying not to put myself out of business and this new method is a biggerbreakthrough than Bayesian filtering.


Maybe I should call it a new plan for spam?

So - I'm just going to introduce the concept right now about how itworks. Once you know what I'm doing it should be easy to implement, Ihad it working in a couple of days and I'm not an outstandingprogrammer. One thing to keep in mind is this is a paradigm shift. It'snot about matching - *it's about NOT matching*. And although it is farbetter at catching spam, it best feature is actively identifying good email.


The secret sauce

Suppose I get an email with the subject line "Let's get some lunch". Iknow it's a good email because spammers never say "Let's go to lunch".In fact there are an infinite number of words and phrases that are usedin good email that are never ever used in spam. And if I'm using wordsand phrases *never used in spam* that are used in ham - it's good email.And similarly - if I'm using words and phrases that are used in spam and*never used in spam* - it's spam.

So - how do I get a list of words and phrases never used in spam? Icreate a list of words and phrases that are used in spam and check tosee if it's *not on the list*.

What I do is tokenize the spamiest parts of the email, like the subjectline, into words and phrases of 1 2 3 and 4 word phrases.


the quick brown fox jumps over the lazy dog - becomes

"the" "quick" "the quick" "brown" "quick brown" "the quick brown" "fox""brown fox" "quick brown fox" "the quick brown fox" "jumps" "fox jumps""brown fox jumps" "quick brown fox jumps" "over" "jumps over" "fox jumpsover" "brown fox jumps over" "the" "over the" "jumps over the" "foxjumps over the" "lazy" "the lazy" "over the lazy" "jumps over the lazy""dog" "lazy dog" "the lazy dog" "over the lazy dog"

These tokens are learned as ham or spam and added to sets. I'm usingRedis to do this because it has extremely fast set operations. I don'tknow of anything other than Redis that can do this. So think about Redisas the way to implement this.

A new message comes in. It is tokenized and fingerprinted and hundredsof fingerprints are generated. Then it's all set operations. the set offingerprints of the test message is intersected with the spam and hamcorpi creating sub sets of matches. Then you do a set diff both ways(ham - spam) (spam - ham) and whichever side is bigger wins. Generallyit will match on only one side or very predominately on one side.

So I'm not just tokenizing the subject. Also the first 25 words of themessage, the text of links in the message, The name part of the fromaddress, The header names, the attachment names, the PHP script if thereis one, and various behavior characteristics, (slow, no quit, no RDNS,number on mime parts, multiple recipients, etc.)

SpamAssassin is all about matching rules. This is all about notmatching. Not matching allows you to compare to an infinite set ratherthan a finite set. So when spammers start misspelling words to not matchthe rules, my system catches that and makes its own rules. The tricksthat spammers use not makes it easier to catch them using this method.

I will post a link to a better explanation later when I write one. Butwanted to let you all know this wasn't just a tease from some crazy person.


So - here's what I want to see happen.

I'd like to see SA implement this. I will provide a license to includewith it giving most people a free license. sort of like how Spamhausisn't free to everyone, but it's in SA. Then the new method will takeoff and eventually I'll get a little something for this.

This new method (I'm calling it the Evolution Spam Filter because thealgorithm mimics evolution.) it doesn't just block spammers, itdecimates spammers. It's not just a treatment - it's the cure. I hatespam and although I could have kept this secret and made money havingthe best spam filter on the planet, I decided I had a moral obligationto make this generally available. I think this will save the globaleconomy billions of dollars in recovered productivity and crime andfraud prevention.

I'm seeing close to 100% accuracy. It is so accurate it's scary and Ithink my implementation is crude at best. I think if it were done rightit could even get closer to 100% than I have. Once you wrap your brainaround the concept it's almost scary how well it works.

The side effects is this is a very fast and simple recursive learner.What happens is that as people converse by email it learns more wordsand phrases about the stuff that people talk about that are never usedin spam. It doesn't have to know what language you are using, it willlearn it on it's own. It's like having SA with 100 million accuraterules where it write new rules itself.


I will leave you with that and I'll have more later.

My new method for blocking spam - REVEALED!

Reply via email to