First - I had a list of words spelled correctly that spammers often deliberately misspell.
What I did was take the subject and the first 200 characters of the body. Then I removed all the words matching the list that were spelled correctly. Then I translated letters used to objuscate into the letters that they were faking and I removed all characters that we junk spacing characters. This process "corrected" the spelling of the ofuscated words. I then compared the strings to my original list and if I found a word it was because it was deliberately misspelled.
Kenneth Porter wrote:
--On Thursday, February 24, 2005 6:07 PM -0500 Phil Barnett <[EMAIL PROTECTED]> wrote:
i or l = [|ííiil1]
a = [EMAIL PROTECTED]
e = [eé3]
o = [o0]
It seems like this is getting overly-complicated. Are there any libraries for doing fuzzy string matching and obfuscation detection that could be used instead of Perl regex's?
-- Marc Perkel - [EMAIL PROTECTED]
Spam Filter: http://www.junkemailfilter.com My Blog: http://marc.perkel.com My Religion: http://www.churchofreality.org ~ "If it's real - we believe in it!" ~