Re: [SAtalk] Distributed Checksum Clearinghouse

Scott Doty Tue, 05 Mar 2002 13:09:18 -0800

On Tue, Mar 05, 2002 at 09:14:18AM +0000, Matt Sergeant wrote:
> On Mon, 4 Mar 2002, Scott Doty wrote:
> > One of our senior system administrators, Kelsey, has had contact with Vipul
> > -- I understand Vipul is working on incorporating "fuzzy" hashes into Razor
> > using the nilsimsa algorithm.   (See http://freshmeat.net/projects/nilsimsa/
> > for more information.)
> >
> > I think the fuzzy matching would be much more appropriate for detecting spam
> > than a checksum or SHA hash.
> 
> I'm not sure how Vipul is going to do this (I don't follow the Razor list
> since Razor is so unreliable that we don't use it). I spent a week
> investigating Nilsimsa, even wrote a perl module for it (which I may
> release if I get permission), but the problem is that you have to run the
> nilsimsa check over every single hash in the database before you know you
> have an approximate match. This gets slow. I generated about 50,000
> nilsimsa hashes as a test, and running the nilsimsa test on those took
> over a second. Way too long.
> 
> So I ditched the idea of using Nilsimsa a while ago.


I wasn't aware that nilsimsa required one to check against all the
hashes in the database.  I take it there's no way to index the hashes
to speed up matches?  (e.g., maybe only check hashes that have approximately
the same number of bits set as the test hash?) (Throw me a line here... ;)

 -Scott

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] Distributed Checksum Clearinghouse

Reply via email to