Hi, I'm fairly new to Tesseract and can't seem to find a proper answer for 
the below-mentioned. Would appreciate any insight and apologies for any 
mistakes! I'm working on extracting text on images that are similar to the 
one shown below: warehouse boxes with all kinds of different labels. Images 
often have poor lighting and angles.

Tesseract seems to perform badly under conditions such as:

   - Poor camera angle

For example, the image here  
<https://user-images.githubusercontent.com/43946966/122963951-5990bb80-d3b9-11eb-9b24-bb65ba4a3a3e.jpg>returns
 
an output of:
['L2 Sy', "////’7/'7///////////////"] on all variations of --oem and --psm 
values.

Perspective correction for the image here 
<https://user-images.githubusercontent.com/43946966/122964090-7d540180-d3b9-11eb-8014-4e40b81b39f6.jpg>gives
 
a slightly better output (though still poor) of:
['R19 159 942 sEMY', 'V/ ////////////////////I////I/////////////'] again, 
on all variations of --oem and --psm values. 

My questions are:

   1. Why does Tesseract seem to perform so badly on such images with poor 
   perspectives, compared to other alternatives like Vision API 
   <https://cloud.google.com/vision>and PaddleOCR 
   <https://github.com/PaddlePaddle/PaddleOCR> which are able to extract 
   text fairly well? Is this an issue that can be corrected through some sort 
   of fine-tuning in Tesseract? Or is this a weak point of Tesseract that has 
   to be addressed with preprocessing (such as blurring, threshold, etc)? If 
   that is the case, the alternate solutions above seem better as they do not 
   require such preprocessing.
   2. Despite changing the values for --oem and --psm as shown here 
   
<https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#page-segmentation-method>,
 
   the output stays the same. Is this expected?

Images are taken from Google and are only a representation of the images 
that I am working on.

Thank you so much for your time! 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/17032f6b-d09b-4ddc-8343-1f6cc83fc3d5n%40googlegroups.com.

Reply via email to