[tesseract-ocr] Poor performance on images with poor angles

Royce Ho Tue, 22 Jun 2021 21:12:45 -0700

Hi, I'm fairly new to Tesseract and can't seem to find a proper answer for 
the below-mentioned. Would appreciate any insight and apologies for any 
mistakes! I'm working on extracting text on images that are similar to the 
one shown below: warehouse boxes with all kinds of different labels. Images 
often have poor lighting and angles.

Tesseract seems to perform badly under conditions such as:

- Poor camera angle

For example, the image here
<https://user-images.githubusercontent.com/43946966/122963951-5990bb80-d3b9-11eb-9b24-bb65ba4a3a3e.jpg>returns

an output of:
['L2 Sy', "////’7/'7///////////////"] on all variations of --oem and --psm
values.

Perspective correction for the image here
<https://user-images.githubusercontent.com/43946966/122964090-7d540180-d3b9-11eb-8014-4e40b81b39f6.jpg>gives

a slightly better output (though still poor) of:
['R19 159 942 sEMY', 'V/ ////////////////////I////I/////////////'] again,
on all variations of --oem and --psm values.

My questions are:

1. Why does Tesseract seem to perform so badly on such images with poor
perspectives, compared to other alternatives like Vision API
<https://cloud.google.com/vision>and PaddleOCR
<https://github.com/PaddlePaddle/PaddleOCR> which are able to extract
text fairly well? Is this an issue that can be corrected through some sort
of fine-tuning in Tesseract? Or is this a weak point of Tesseract that has
to be addressed with preprocessing (such as blurring, threshold, etc)? If
that is the case, the alternate solutions above seem better as they do not
require such preprocessing.
2. Despite changing the values for --oem and --psm as shown here

<https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#page-segmentation-method>,

the output stays the same. Is this expected?

Images are taken from Google and are only a representation of the images
that I am working on.

Thank you so much for your time!

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/17032f6b-d09b-4ddc-8343-1f6cc83fc3d5n%40googlegroups.com.

[tesseract-ocr] Poor performance on images with poor angles

Reply via email to