>-----Original Message-----
>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
>Sent: Monday, February 28, 2005 10:34 AM
>To: Loren Wilton
>Cc: users@spamassassin.apache.org
>Subject: Re: Obfuscation (was: Millions and Billions) 
>
>
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>
>Loren Wilton writes:
>> Since a tool can generate the matching pattern and convert 
>it to a re, it
>> seems that a tool could in theory generate a matching 
>pattern and convert it
>> to something else that might be either more comprehensible or more
>> efficient.  Or possibly a tool could be made that would do a 
>direct fuzzy
>> match from the unobfuscated word.  (However, I think this 
>last possibility
>> would be slower than pre-obfuscating; but possibly it wouldn't be.)
>> 
>> The problem is that perl doesn't have any syntax to 
>efficiently describe
>> this obfuscated match other than an incomprehensible regex.
>> 
>> Someone could invent such a tool, and it could either be a 
>plugin to SA or a
>> part (or addon subroutine) called by perl itself.  In fact I 
>believe that at
>> least two fuzzy matching plugins have been added to SA in 
>the last week.
>> Whether they are as efficient, or more efficient, than the 
>current horrid
>> re's is an interesting question.
>
>they actually generate the horrid REs internally. ;)
>
>A paper at the spam conference suggested using an Edit 
>Distance algorithm
>with very good results; the idea being, the edit distance from 
>"cialis" to
>"C 1 a l | s" isn't as far as it is to "specialized" or so on.
>
>if I recall correctly, someone submitted an implementation 
>quite a while
>ago on our BZ, but I think the FP rates were too high.   Given the
>recent paper's published results, though, it may be there are good ways
>to tweak it to get FPs at a tolerable rate.
>
>If anyone wants to have a try, please do ;)

I remember that paper. I was impressed and sceptical at the same time. I
could see it FPing a lot. One person in the crowd brought up Niagra vs. the
V-drug word :) 

Cialis vs. Dial-Lisa 
ect......

--Chris 

Reply via email to