Hi,
On 12/02/2022 22:13, Alberto Simoes wrote:
Hi
I am OCRing a lot of documents. I have a document with very poor
quality, and surely nothing will be recognized. But I need a stable
pipeline, and while I was expecting tesseract just to return an empty
document, I am getting this error:
Detected 958 diacritics
Error during processing.
Is there anything I can do to use tesseract more reliably, without the
chance of getting it to just die?
You can try using a different binarisation method, or cleaning up the
images before doing OCR. Do you have an example you can share?
Tesseract 5.0.0 should support -c thresholding_method=2 and additionally
you can pass the --dpi 300 (or whatever value it is) for your image.
That might make it more robust even without pre-processing your images.
By the way, I am using it through pytesseract, but I do not think that
is the problem.
I don't know if pytesseract supports these extra options, so you might
have to fiddle with that.
Regards,
Merlijn
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/c690418f-d2b4-61ff-f875-a668bce3deaf%40archive.org.