Re: [tesseract-ocr] Too many diacritics can make process die?

Merlijn B.W. Wajer Sun, 13 Feb 2022 00:14:58 -0800

Hi,

On 12/02/2022 22:13, Alberto Simoes wrote:

Hi
I am OCRing a lot of documents. I have a document with very poorquality, and surely nothing will be recognized. But I need a stablepipeline, and while I was expecting tesseract just to return an emptydocument, I am getting this error:
Detected 958 diacritics
Error during processing.
Is there anything I can do to use tesseract more reliably, without thechance of getting it to just die?

You can try using a different binarisation method, or cleaning up theimages before doing OCR. Do you have an example you can share?

Tesseract 5.0.0 should support -c thresholding_method=2 and additionallyyou can pass the --dpi 300 (or whatever value it is) for your image.That might make it more robust even without pre-processing your images.

By the way, I am using it through pytesseract, but I do not think thatis the problem.

I don't know if pytesseract supports these extra options, so you mighthave to fiddle with that.


Regards,
Merlijn

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c690418f-d2b4-61ff-f875-a668bce3deaf%40archive.org.

Re: [tesseract-ocr] Too many diacritics can make process die?

Reply via email to