[tesseract-ocr] Microscopy label, poor recognition

'Martin Weihrauch' via tesseract-ocr Tue, 21 Dec 2021 02:08:22 -0800

 

I have an image (label of a microscopy slide), which I thought would be 
easy to OCR, because it is easily readable for humans. I am using the 
latest Tesseract V5 as a command line under Windows However, with
tesseract image.jpg image.txt --oem 1 --psm x


with "--psm x" x being any number, which I tried, the results are poor (it 
misses the bottom line with "LOT40446" and thinks "+" is a "4" after 
binarization of the image I post here. Is there anything I can do to 
improve the results? 

I tried:

- Binarizing the image

- Setting DPI to 300 dpi

With these latter, it produced: 

*| +125 PROCock tai*

* | 12/03/2021*

*| 36729/21 344*


Do you have any suggestion for improvements? On a side note, I tried the in 
Windows 10 available library a9t9, which was a lot better, but had also 
weaknesses.

[image: JBOBF.jpg] 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c46c2f85-3bbb-4ebe-8107-da48034abee3n%40googlegroups.com.

[tesseract-ocr] Microscopy label, poor recognition

Reply via email to