On Mon, 8 Jan 2007, Jorge Valdes wrote: > I do understand that in large environments, optimizations have to be made in > order not to kill server performance, and expiration is probably something > that could be done at "more convenient times". I will commit a script that > can safely be run as a cronjob soon.
Excellent. > I understand that the "order" keyword in select is potentially expensive, but > necessary because matches occur generally towards the most recent entries, > thus increasing the possibility of a match earlier on. When your hash count > is in the thousands, earlier matches mean less queries to the database, and > potentially faster results. It's not just the order directive, it's the iteration throughout the entire database. Consider when the database grows to >50k records. For a new image that doesn't have a hash, that's 50k records that must be sorted then sent from the DB server to the mail server, then all 50k records must be checked against the hash before we decide that we haven't seen this image before. That just isn't a workable algorithm. If iteration throughout the entire database is a requirement, hashing is a performance hit rather than a performance gain. A better solution might be a seperate daemon that holds the hashes in memory, to which you submit the hash being considered. Honestly, I have been extremely impressed with having hashing turned completely off. Andy --- Andy Dills Xecunet, Inc. www.xecu.net 301-682-9972 ---