Giampaolo: I hope you succeed.

I've given up hope on convincing folks (Mapquest in particular) that radius
searches can be indexed. You needn't pull the lat/long of every single entry
to run the distance function, and then discard the ones too far away. You
can index on LAT and LONG and structure the query such that only the
"possible" lat/long values need the distance function (and the rest of the
record fetched) evaluated.

Just because it's two orders of magnitude more efficient doesn't make
anybody listen.

Same conversation, different universe!

Dan

-----Original Message-----
From: Giampaolo Tomassoni [mailto:[EMAIL PROTECTED]
Sent: Monday, January 08, 2007 2:00 PM
To: [EMAIL PROTECTED]; users@spamassassin.apache.org
Subject: RE: [Devel-spam] FuzzyOcr 3.5.1 released


From: Andy Dills [mailto:[EMAIL PROTECTED]
>
> ...omissis...
>
> > I understand that the "order" keyword in select is potentially
> expensive, but
> > necessary because matches occur generally towards the most
> recent entries,
> > thus increasing the possibility of a match earlier on.  When
> your hash count
> > is in the thousands, earlier matches mean less queries to the
> database, and
> > potentially faster results.
>
> It's not just the order directive, it's the iteration throughout the
> entire database.
>
> Consider when the database grows to >50k records. For a new image that
> doesn't have a hash, that's 50k records that must be sorted then
> sent from
> the DB server to the mail server, then all 50k records must be checked
> against the hash before we decide that we haven't seen this image before.
> That just isn't a workable algorithm. If iteration throughout the entire
> database is a requirement, hashing is a performance hit rather than a
> performance gain.
>
> A better solution might be a seperate daemon that holds the hashes in
> memory, to which you submit the hash being considered.

Other ways could be the ones depicted in my recent post (Message-ID:
<[EMAIL PROTECTED]>), in which close images
are basicly clustered together thanks to a surrogate index.

giampaolo

>
> Honestly, I have been extremely impressed with having hashing turned
> completely off.
>
> Andy
>
> ---
> Andy Dills
> Xecunet, Inc.
> www.xecu.net
> 301-682-9972
> ---


Reply via email to