Chris, > > AFAIK though it isn't possible to place a cap on the FuzzyOCR score. I > > don't want to, but I detune it purely to reduce the likelyhood of > > something hitting my discard threshold by OCR alone. > > If you consider this feature so important, then I could implement a > max_score feature that caps the score done by word recognition. This is > easy to implement. > > Or should it rather be a cap to all FuzzyOcr rules, including the others > like malformed file etc?
For me a cap on the total score from FuzzyOcr was mandatory. It was inacceptable that it alone could exceed the threshold, typically when a multitude of similar FuzzyOcr hits happened. I kept patching previous versions with: --- FuzzyOcr.pm.ori Sun Jan 7 13:05:08 2007 +++ FuzzyOcr.pm Tue Jan 9 15:09:24 2007 @@ -927,4 +927,5 @@ infolog($debuginfo) unless ($conf->{focr_enable_image_hashing} == 3); } + $score = 5 if $score > 5; # !!! Mark for my $set ( 0 .. 3 ) { $pms->{conf}->{scoreset}->[$set]->{"FUZZY_OCR"} = $score; Mark