I had a trick I was using in Exim that worked pretty well and cound be recoded in perl.

First - I had a list of words spelled correctly that spammers often deliberately misspell.

What I did was take the subject and the first 200 characters of the body. Then I removed all the words matching the list that were spelled correctly. Then I translated letters used to objuscate into the letters that they were faking and I removed all characters that we junk spacing characters. This process "corrected" the spelling of the ofuscated words. I then compared the strings to my original list and if I found a word it was because it was deliberately misspelled.

Kenneth Porter wrote:

--On Thursday, February 24, 2005 6:07 PM -0500 Phil Barnett <[EMAIL PROTECTED]> wrote:

i or l = [|ííiil1]

a = [EMAIL PROTECTED]

e = [eé3]

o = [o0]


It seems like this is getting overly-complicated. Are there any libraries for doing fuzzy string matching and obfuscation detection that could be used instead of Perl regex's?


-- Marc Perkel - [EMAIL PROTECTED]

Spam Filter: http://www.junkemailfilter.com
   My Blog: http://marc.perkel.com
My Religion: http://www.churchofreality.org
~ "If it's real - we believe in it!" ~




Reply via email to