On Tue, 20 Jan 2004 16:37:27 -0500 (EST), Charles Gregory <[EMAIL PROTECTED]> writes:

> I'm starting to see mail with TEXT obfuscation, such as:
>    I heard you need viagrPa. 
> Note the capital P thrown in to our favorite 'v' word.
> It is really beginning to look like we need a genuine spelling checker, or
> some sort of 'approximation' technology, if such exists. There is no
> 'pattern' I can think of to defeat this mis-spelling spam in any other
> way.

For obfuscations of abcfffg:

Basically, we transform abcfffg into:

a?bcfffg
ab?cfffg
abc?fffg
abcf?ffg
abcff?fg
abcfff?g
abcfffg?

to deal with any one single missing letter, and then put:
   ([ /_-]*|.?)

between each one. That can represent both a set of seperator non-word
characters *and* a single any-character.

Giving us seven lines like:

a?([ /_-]*|.?)b([ /_-]*|.?)c([ /_-]*|.?)f([ /_-]*|.?)f([ /_-]*|.?)f([ /_-]*|.?)g([ 
/_-]*|.?)  { 1; }

This, or variants of it won't catch every word obfuscation, but these
should be somewhat more robust against FP's and may make it a lot
easier on bayes.

A second option is to not do this as a rule, but do this sort of
obfuscation-analysis only on new tokens. If a token has never before
been seen, but it appears close to what seems to be an obfuscated
bad-word, we assign it a provisional spam-probability when doing
baysean analysis.

Scott


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to