On Wed, 6 Mar 2002, Kelsey Cummings wrote: > On Tue, Mar 05, 2002 at 09:14:18AM +0000, Matt Sergeant wrote: > > I'm not sure how Vipul is going to do this (I don't follow the Razor list > > since Razor is so unreliable that we don't use it). I spent a week > > investigating Nilsimsa, even wrote a perl module for it (which I may > > release if I get permission), but the problem is that you have to run the > > nilsimsa check over every single hash in the database before you know you > > have an approximate match. This gets slow. I generated about 50,000 > > nilsimsa hashes as a test, and running the nilsimsa test on those took > > over a second. Way too long. > > Matt - could you quantify your test a bit? What kind of cpu was used to > process the benchmark? I've heard some rumor to the affect that there are > a few optimized searching algorithms for nilsimsa that have appeared that > could resolve the obvious performance problems.
I suppose potentially you could use a tree based search algorithm, but that would require understanding more about how Nilsimsa really works, and I just didn't have time for that. The server was a P3-400 iBus rackmount I think. Since doing that I've focussed more on SpamAssassin tests, which seem to be much more affective than any sort of database lookup, and fast enough for us (and we do about 7m emails a day, 20% of which is spam). -- Matt. <:->get a SMart net</:-> ________________________________________________________________________ This e-mail has been scanned for all viruses by Star Internet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk