Thanks for this. It'll be useful to show the next person who tries to convince me software patents are a good idea.
Sent from my mobile. Please excuse any unusual brevity or typos while I'm on the go. > On 22 Jan 2016, at 7:48 AM, Marc Perkel <supp...@junkemailfilter.com> wrote: > > Just to follow up on this. I'm in the process of improving the filter. But I > have filed my provisional patent so i'm going to give you an overview of how > it works. > > Most spam filters work by matching things. Matching ham and spam. Matching > rules. The important point here in this is this new system I'm calling the > Evolution filter is about NOT matching. > > Suppose I sent you an email with the subject line "Let's get dinner". You can > tell instantly this is good email. How? Because spammers never say "Let's get > dinner". > > There are millions of phrases used in good email every day that are never > used in spam. And - there are millions of phrases used everyday in spam that > are never used in good email. So if I get an email that matches phrases used > in good email and never used in spam - it's a good message. And if the > messages contains words and phrases used in spam and never used in ham - it's > spam. > > So - how do I get a list of all phrases never used in ham or never used in > spam? I make a list of all words and phrases used in ham and spam and test to > see if it's NOT in the list. To illustrate my point, > > Here is a list of 5505874 words and phrases used in the subject line of HAM > and never seen in the subject line of SPAM > > http://www.junkemailfilter.com/data/subject-ham.txt > > Here is a list of 3494938 words and phrases used in the subject line of SPAM > and never seen in the subject line of HAM > > http://www.junkemailfilter.com/data/subject-spam.txt > > The thing about not matching is that matching involves finite sets. Not > matching involves infinite sets. And infinite sets are always bigger than > finite sets. > > Here in a link to my patent. > > http://www.junkemailfilter.com/patent/ > > What I intend to do is to give it away to the little guys and charge the big > guys a small license fee. The process of implementing this is fairly easy. > I'm hoping to encourage the open source world to take this idea and do it > right. My code it cobbled together and uses 4 different languages. But the > concept is enough to get you going. > > One thing you will need to implement this is Redis. Redis is extremely fast > at set comparisons and set comparisons is how this works. It's can be > expressed as one formula. > > score = card(SpamCorpus intersect TestMessage diff HamCorpus) - > card(HamCorpus intersect TestMessage diff SpamCorpus) > > I'm seeing an accuracy level that is so close to 100% it's scary. It is > especially good at actively identifying good email to prevent false positives. > > I will post more soon as it all comes together. > > > > > _______________________________________________ > mailop mailing list > mailop@mailop.org > https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop _______________________________________________ mailop mailing list mailop@mailop.org https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop