On Tue, Mar 05, 2002 at 09:14:18AM +0000, Matt Sergeant wrote: > On Mon, 4 Mar 2002, Scott Doty wrote: > > One of our senior system administrators, Kelsey, has had contact with Vipul > > -- I understand Vipul is working on incorporating "fuzzy" hashes into Razor > > using the nilsimsa algorithm. (See http://freshmeat.net/projects/nilsimsa/ > > for more information.) > > > > I think the fuzzy matching would be much more appropriate for detecting spam > > than a checksum or SHA hash. > > I'm not sure how Vipul is going to do this (I don't follow the Razor list > since Razor is so unreliable that we don't use it). I spent a week > investigating Nilsimsa, even wrote a perl module for it (which I may > release if I get permission), but the problem is that you have to run the > nilsimsa check over every single hash in the database before you know you > have an approximate match. This gets slow. I generated about 50,000 > nilsimsa hashes as a test, and running the nilsimsa test on those took > over a second. Way too long. > > So I ditched the idea of using Nilsimsa a while ago.
I wasn't aware that nilsimsa required one to check against all the hashes in the database. I take it there's no way to index the hashes to speed up matches? (e.g., maybe only check hashes that have approximately the same number of bits set as the test hash?) (Throw me a line here... ;) -Scott _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk