Addition of a Soundex module for seeing if words "sound" like the
words given negative scores in SpamAssassin might be an interesting
trick.
{^_^}
----- Original Message ----- From: "Chris St. Pierre" <[EMAIL PROTECTED]>


One thing I've wondered/thought about is using the Levenshtein
difference between the words in an email and a list of spam words
(ideally pulled from the bayes db).  In this case, all of the
misspelled words in that sample have a L-distance of 1 from the real
word -- in other words, they're *very* close.

I think the problem would be that this would consume tons of
resources.  Anything else, though, would be susceptible to other typo
attacks.  For instance, say you took each email, and replaced all
doubled letters with single letters, it wouldn't be long before you
were getting spam advertising "analr bictches" or the like.

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University

On Wed, 4 Oct 2006, Eric A. Hall wrote:


On 10/4/2006 5:57 PM, Richard Doyle wrote:
I've been getting lots of porn site spam containing words with doubled
letters, like this one:

Can anybody suggest a rule or ruleset to catch these double-letter
obfuscations? I'm using Spamassassin 3.1.4.

You'd probably need to write a plug-in that used some kind of
typo-matching logic to find porno words.

Would be a good plug-in actually. Get busy :)

--
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/

Reply via email to