From: Andy Dills [mailto:[EMAIL PROTECTED] > > ...omissis... > > > I understand that the "order" keyword in select is potentially > expensive, but > > necessary because matches occur generally towards the most > recent entries, > > thus increasing the possibility of a match earlier on. When > your hash count > > is in the thousands, earlier matches mean less queries to the > database, and > > potentially faster results. > > It's not just the order directive, it's the iteration throughout the > entire database. > > Consider when the database grows to >50k records. For a new image that > doesn't have a hash, that's 50k records that must be sorted then > sent from > the DB server to the mail server, then all 50k records must be checked > against the hash before we decide that we haven't seen this image before. > That just isn't a workable algorithm. If iteration throughout the entire > database is a requirement, hashing is a performance hit rather than a > performance gain. > > A better solution might be a seperate daemon that holds the hashes in > memory, to which you submit the hash being considered.
Other ways could be the ones depicted in my recent post (Message-ID: <[EMAIL PROTECTED]>), in which close images are basicly clustered together thanks to a surrogate index. giampaolo > > Honestly, I have been extremely impressed with having hashing turned > completely off. > > Andy > > --- > Andy Dills > Xecunet, Inc. > www.xecu.net > 301-682-9972 > ---