> > The photos are just coming straight from my digital camera. Same > > format (JPEG), varying size (6-10 megapixel) and I would like to be > > able to pick one and then query the database for similar ones. For > > example: I pick a photo which is more or less a portrait of someone, > > the query should return other photos with more or less portraits. If I > > pick a landscape with lot of green and a mountain the query should > > result in other nature (mostly green) photos. Something along these > > lines, of course the matches won't be perfect because I'm looking for > > a simple algorithm, but something along these lines. > > > > Ah. In that case, SIFT isn't for you. SIFT would work well if you have > multiple photos of the same object. Say, a building from different > angles, or the a vase against different backdrops. > > If I'm understanding your correctly, what you're attempting here is very > general and well into the highly experimental. I've been wishing for > such a feature to appear in something like Google Image Search > (pick/submit a photo and return similar images found on the web). I'm > sure if there's even a practical solution, Google (or MS) would be on it > already. > > The problem is that there isn't really one. Despite what you may see > claimed in university press releases and research papers, the current > crop of algorithms don't work very well, at least according to my > understanding and discussion with researchers in this field. The glowing > results tend to be from tests done under ideal conditions and there's no > real practical and commercial solution. > > If you restrict the domain somewhat, there are some solutions, but none > trivial. You are probably aware of the face searches available on Google > and Live. > > The histogram approach suggested by Shane Geiger may work for some cases > and in fact would work very well for identical resized images. I doubt > it will work for the general case. A mountain with a grassy plain at > noon has quite a different histogram from one at sunset, and yet both > have related content. Manual tagging of the images, a la Flickr, would > probably be your best bet.
Since you seem to know quite a bit about this topic, what is your opinion on the apparently 'generic' algorithm described here: http://grail.cs.washington.edu/projects/query/ ? So far it seems to me that it does what I'm asking for, it does even more because it can take a hand drawn sample image and query the database for similar photos. There is even a python implementation for it here: http://members.tripod.com/~edcjones/pycode.html On the histogram method I agree that it won't work partly because of what you say and partly because it is terribly slow since it's comparing every single pixel. Thanks, Daniel -- http://mail.python.org/mailman/listinfo/python-list