Chris,

> > AFAIK though it isn't possible to place a cap on the  FuzzyOCR score. I
> > don't want to, but I detune it purely to reduce the likelyhood of
> > something hitting my discard threshold by OCR alone.
>
> If you consider this feature so important, then I could implement a
> max_score feature that caps the score done by word recognition. This is
> easy to implement.
>
> Or should it rather be a cap to all FuzzyOcr rules, including the others
> like malformed file etc?

For me a cap on the total score from FuzzyOcr was mandatory.

It was inacceptable that it alone could exceed the threshold,
typically when a multitude of similar FuzzyOcr hits happened.
I kept patching previous versions with:

--- FuzzyOcr.pm.ori     Sun Jan  7 13:05:08 2007
+++ FuzzyOcr.pm Tue Jan  9 15:09:24 2007
@@ -927,4 +927,5 @@
             infolog($debuginfo) unless ($conf->{focr_enable_image_hashing} == 
3);
         }
+        $score = 5  if $score > 5;  # !!! Mark
         for my $set ( 0 .. 3 ) {
             $pms->{conf}->{scoreset}->[$set]->{"FUZZY_OCR"} = $score;


Mark

Reply via email to