On Wed, 6 Mar 2002, Kelsey Cummings wrote:

> On Tue, Mar 05, 2002 at 09:14:18AM +0000, Matt Sergeant wrote:
> > I'm not sure how Vipul is going to do this (I don't follow the Razor list
> > since Razor is so unreliable that we don't use it). I spent a week
> > investigating Nilsimsa, even wrote a perl module for it (which I may
> > release if I get permission), but the problem is that you have to run the
> > nilsimsa check over every single hash in the database before you know you
> > have an approximate match. This gets slow. I generated about 50,000
> > nilsimsa hashes as a test, and running the nilsimsa test on those took
> > over a second. Way too long.
>
> Matt - could you quantify your test a bit?  What kind of cpu was used to
> process the benchmark?  I've heard some rumor to the affect that there are
> a few optimized searching algorithms for nilsimsa that have appeared that
> could resolve the obvious performance problems.

I suppose potentially you could use a tree based search algorithm, but
that would require understanding more about how Nilsimsa really works, and
I just didn't have time for that. The server was a P3-400 iBus rackmount I
think. Since doing that I've focussed more on SpamAssassin tests, which
seem to be much more affective than any sort of database lookup, and fast
enough for us (and we do about 7m emails a day, 20% of which is spam).

-- 
Matt.
<:->get a SMart net</:->


________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to