Hi, I'm fairly new to Tesseract and can't seem to find a proper answer for the below-mentioned. Would appreciate any insight and apologies for any mistakes! I'm working on extracting text on images that are similar to the one shown below: warehouse boxes with all kinds of different labels. Images often have poor lighting and angles.
Tesseract seems to perform badly under conditions such as: - Poor camera angle For example, the image here <https://user-images.githubusercontent.com/43946966/122963951-5990bb80-d3b9-11eb-9b24-bb65ba4a3a3e.jpg>returns an output of: ['L2 Sy', "////’7/'7///////////////"] on all variations of --oem and --psm values. Perspective correction for the image here <https://user-images.githubusercontent.com/43946966/122964090-7d540180-d3b9-11eb-8014-4e40b81b39f6.jpg>gives a slightly better output (though still poor) of: ['R19 159 942 sEMY', 'V/ ////////////////////I////I/////////////'] again, on all variations of --oem and --psm values. My questions are: 1. Why does Tesseract seem to perform so badly on such images with poor perspectives, compared to other alternatives like Vision API <https://cloud.google.com/vision>and PaddleOCR <https://github.com/PaddlePaddle/PaddleOCR> which are able to extract text fairly well? Is this an issue that can be corrected through some sort of fine-tuning in Tesseract? Or is this a weak point of Tesseract that has to be addressed with preprocessing (such as blurring, threshold, etc)? If that is the case, the alternate solutions above seem better as they do not require such preprocessing. 2. Despite changing the values for --oem and --psm as shown here <https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#page-segmentation-method>, the output stays the same. Is this expected? Images are taken from Google and are only a representation of the images that I am working on. Thank you so much for your time! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/17032f6b-d09b-4ddc-8343-1f6cc83fc3d5n%40googlegroups.com.