On Fri, 30 Jan 2004 10:55:07 -0500, Matt Kettler <[EMAIL PROTECTED]> writes:

> Today I got an interesting form of obfuscation, apparently to avoid
> antidrug.cf.
> 
> 
> I'm not sure wether to bother with adding rules for this, or be
> satisfied that the obfuscations are so severe that the messages are
> now barely legible.

Its definitely barely legible, which is a VERY good thing.

> Orxder your Vjiagmra and Skupter Vimagera saifely and securfely onlijne.
> 
> Esntper Hekre

My guess is to probably let bayes deal with it, but I'll speculate
that bayes should be able to deal with this better if the spam
probability is boosted for an unseen token inversely proportional to
its edit distance from certain frequently obfuscated words.

Something like:

    http://search.cpan.org/~jhi/String-Approx-3.23/Approx.pm
or  http://www.merriampark.com/ldperl.htm

Plus some eval rules so that if a word is not in the bayes database,
but its edit distance from 'FOOBAR' is 2, it is given a spam
probability of .90, or if its edit distance from 'FOOBAR' is 1, it is
given a spam probability of .95.

Well, its just an idea.

Scott


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to