Hi there softworkz.

Having worked before with OCR filter output, I suggest you a modification for your new filter. It's not something that should delay the patch, but just a nice addenum. Could be done in another patch, or could even do it myself in the future. But I let the comment here anyways, for you to consider.

If you take a look at vf_ocr, you'll see that it sets "lavfi.ocr.confidence" metadata field. Well... downstream filters can check that field in order to just consider certain confidence threshold, discarding the rest. This is very useful when doing OCR with non-ascii chars, like I do with Spanish language.

So I propose an option like this:

  { "confidence", "Sets the confidence threshold for valid OCR. Default 80." , OFFSET(confidence), AV_OPT_TYPE_INT, {.i64=80}, 0, 100, FLAGS },

Then you do an average of all confidences detected by tesseract after OCR but before converting to text subtitle frame, and compare that option value to the average result.
Something like this:

  int average = sum_of_all_confidences / number_of_confidence_items;
  if (average >= s->confidence) {
    do_your_thing();
  } else {
    av_log(ctx, AV_LOG_DEBUG, "Confidence average %d under threshold. Text detected: '%s'\n", average, text);
  }

Also, I would like to do some tests with spanish OCR, as I had to explicitly allowlist our non-ascii chars when using OCR filter, and don't know how yours will behave in that situation. Maybe having the chars allowlist option here too is a good idea. But, again: none of this this should delay the patch, as your work is much more important than this kind of nice to have functionalities, which could be easily implemented later by anyone.

Thanks,
Daniel.
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Reply via email to