> -----Messaggio originale----- > Da: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Per conto di > snowcrash+sa > > hi andy, > > > For what it's worth, the fuzzyocr hashing is of very limited value, > and in > > many cases is a severe performance hit. I found that scanning the > hashes, > > due to the "fuzzy" nature, is more costly than just rescanning the > file > > with OCR, as *each* *and* *every* hash must be checked iteratively. > > now, *that's* an interesting point to consider. > > i'd be interested in what, then, the 'goal' of the hashing/comparison > *is*? > > is it performance, and it just missed the mark for the reasons you > state? or is it something else?
The main purpose of the FuzzyOcr's db was of course to avoid computing the OCR passes needed to decode the image text for known images. The problem is that the cache content is not searched for an exact match of the key values (which are image type, width, height, number of colors and color frequencies): it looks for the best match of these values within a given range. This has a number of drawbacks: a) range search defeats look-up indexing in the db, thereby resulting in browsing the whole db for a match; b) range search also increases false positive matches on the db content; c) the db caches OCR results, thereby a mach on it may return an unwanted/imprecise result if you tweak FuzzyOcr config and/or words files. The first drawback may yield high processing times and even timeouts when you have a medium-loaded mail server, the second one is probably the worst problem to most of us and the latter is, well, another problem. So, yes: FuzzyOCR's cache was meant to increase performances and, yes again, it basically missed the mark. The solution is to simply discard the cache db and run the OCR phases on every and each image: on most but the less loaded servers this is the most effective way to deal with it. Most of us are used to turn glitches off while keeping the good work... :) Giampaolo > dunno. > > but, your point bears some benchmarking ... > > thx!