<snip> The main purpose of the FuzzyOcr's db was of course to avoid computing the OCR passes needed to decode the image text for known images. The problem is that the cache content is not searched for an exact match of the key values (which are image type, width, height, number of colors and color frequencies): it looks for the best match of these values within a given range. This has a number of drawbacks:
a) range search defeats look-up indexing in the db, thereby resulting in browsing the whole db for a match; </snip> Range searching in a database is (can be?) vastly faster than a full table scan. You have to USE the indexes, not just assume an FTS will be required. The database optimizer _SHOULD_ figure this out, but only if it has reasonable statistics and is passed a reasonable query. I got some SQL from MapQuest once, that had a WHERE clause containing the arithmetic to compute the distance from a given location (a "radius" search). As coded, of course, a Full Table Scan was required and the distance function was evaluated to determine a row's presence in the result set. It was very slow, even for just a few hundred thousand location records. I indexed the Latitude and Longitude columns, and expressed the query without the radius in the WHERE, but rather the ranges of possible Latitude and Longitude values (In effect, the rectangle that just contained the circle the user desired). The unwanted "corners" of the result set were discarded, rather than every single row outside the desired radius. The performance gains (for normal radiuses, 10, 20, 50, 100 miles) were enormous. The average gain was 100:1 - two orders of magnitude. MapQuest, of course, wasn't interested. The SQL ran on their clients' machines, not theirs. If fuzzyOCR caching method has any merit at all, tuning the SQL and/or the database will provide decent performance. "Explain Execution Plan" is your friend! Dan Barker