>-----Original Message----- >From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] >Sent: Monday, February 28, 2005 10:34 AM >To: Loren Wilton >Cc: users@spamassassin.apache.org >Subject: Re: Obfuscation (was: Millions and Billions) > > >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > > >Loren Wilton writes: >> Since a tool can generate the matching pattern and convert >it to a re, it >> seems that a tool could in theory generate a matching >pattern and convert it >> to something else that might be either more comprehensible or more >> efficient. Or possibly a tool could be made that would do a >direct fuzzy >> match from the unobfuscated word. (However, I think this >last possibility >> would be slower than pre-obfuscating; but possibly it wouldn't be.) >> >> The problem is that perl doesn't have any syntax to >efficiently describe >> this obfuscated match other than an incomprehensible regex. >> >> Someone could invent such a tool, and it could either be a >plugin to SA or a >> part (or addon subroutine) called by perl itself. In fact I >believe that at >> least two fuzzy matching plugins have been added to SA in >the last week. >> Whether they are as efficient, or more efficient, than the >current horrid >> re's is an interesting question. > >they actually generate the horrid REs internally. ;) > >A paper at the spam conference suggested using an Edit >Distance algorithm >with very good results; the idea being, the edit distance from >"cialis" to >"C 1 a l | s" isn't as far as it is to "specialized" or so on. > >if I recall correctly, someone submitted an implementation >quite a while >ago on our BZ, but I think the FP rates were too high. Given the >recent paper's published results, though, it may be there are good ways >to tweak it to get FPs at a tolerable rate. > >If anyone wants to have a try, please do ;)
I remember that paper. I was impressed and sceptical at the same time. I could see it FPing a lot. One person in the crowd brought up Niagra vs. the V-drug word :) Cialis vs. Dial-Lisa ect...... --Chris