I have an image (label of a microscopy slide), which I thought would be easy to OCR, because it is easily readable for humans. I am using the latest Tesseract V5 as a command line under Windows However, with tesseract image.jpg image.txt --oem 1 --psm x
with "--psm x" x being any number, which I tried, the results are poor (it misses the bottom line with "LOT40446" and thinks "+" is a "4" after binarization of the image I post here. Is there anything I can do to improve the results? I tried: - Binarizing the image - Setting DPI to 300 dpi With these latter, it produced: *| +125 PROCock tai* * | 12/03/2021* *| 36729/21 344* Do you have any suggestion for improvements? On a side note, I tried the in Windows 10 available library a9t9, which was a lot better, but had also weaknesses. [image: JBOBF.jpg] -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c46c2f85-3bbb-4ebe-8107-da48034abee3n%40googlegroups.com.