From: decoder [mailto:[EMAIL PROTECTED] > > ...omissis... > > > Hash the hashes and store them in a suitable tree? > I explained before that you cannot hash the hashes because a > cryptographic hash is tolerance resistant. A fuzzy matching on such a > hash of the actual hash is impossible then.
Ok. Just to add more useless thoughts to the already shot out, may I ask a question? I see that the first part of an hash is made of file size, image height and width, and image color count. Now, since the target is to construct an image's weak key, I don't see any purpose for both file size and image color count: both may change greately depending on the image "carrier" (which is the image type) and may possibly relate too loosely with the true image content to be useful. Why are they taken? Let me also start a bit of brainstorming. I see in Hashing.pm that an image in the database matches the searched one if the components of its composite key are all below a given threshold, expressed as a percentage of the value. This is interesting, because we could use overlapping categories here thanks to the threshold percentage. Let me explain by focusing on a single parameter: image width "w". Let assume we have a db field "c" and a monotonic transfer function c=F(W), such that F(W-W*threshold) >= c-1 and F(W+W*threshold) <= c+1. Then, saying abs(w[i]-W)<threshold means W-W*threshold<w[i]<W+W*threshold and, due to assumed monotonicity of F(W), at most F(W-W*threshold)<=F(w[i])<=F(W+W*threshold) or c-1<=c<=c+1. Which means that all the images having a width within +-threshold from W would be contained at most in categories ( c-1 and c ) or ( c and c+1 ) and you don't have to browse all the images but just the ones matching these values in column "c". Simply put, you may then "search by index" into the db instead of just browse all the images in it. Please note that, given a W, you need to inspect the images in class c and either the ones in class c-1 or c+2 thank to the value of W itself (inspect c-1 if W < inv(F, c), c+1 otherwise). This method could possibly be further applied. If, in example, you extend this method also to the image height, you end inspecting images in a neightborhood of 4 image groups. It can't be applied to deeply, however, since you would end inspecting 2^n neightborhood with n being the number of classified parameters. Classification may however be used for a surrogate key. Take some n key components (in example, h*w and color frequencies) as coordinates of a point in a hypercube (a "component space"). Then choose a distance function and categorize the images by the value of the distance with respect to a fixed point (the origin, in example). You would end up inspecting images in only two clusters. A good integer transfer function could be: F(w) = floor( ln(w) / ln(1/(1 - threshold)) ) the inverse being: W(c) = (1/(1-t))^c EOT (End Of Thoughts ) Please don't flame too hard... :) Thanks, giampaolo > Chris