[tesseract-ocr] Microscopy label, poor recognition

2021-12-21 Thread 'Martin Weihrauch' via tesseract-ocr
I have an image (label of a microscopy slide), which I thought would be easy to OCR, because it is easily readable for humans. I am using the latest Tesseract V5 as a command line under Windows However, with tesseract image.jpg image.txt --oem 1 --psm x with "--psm x" x being any number, whi

Re: [tesseract-ocr] Microscopy label, poor recognition

2021-12-21 Thread Merlijn B.W. Wajer
Hi Martin, Some of the advice below applies to Tesseract 5 only... On 21/12/2021 09:38, 'Martin Weihrauch' via tesseract-ocr wrote: > > > I have an image (label of a microscopy slide), which I thought would be > easy to OCR, because it is easily readable for humans. I am using the > latest T

Re: [tesseract-ocr] Microscopy label, poor recognition

2021-12-21 Thread 'Martin Weihrauch' via tesseract-ocr
Thank you so much for your efforts! Merlijn Wajer schrieb am Dienstag, 21. Dezember 2021 um 11:53:44 UTC+1: > Hi Martin, > > Some of the advice below applies to Tesseract 5 only... > > On 21/12/2021 09:38, 'Martin Weihrauch' via tesseract-ocr wrote: > > > > > > I have an image (label of a micro

RE: [tesseract-ocr] Microscopy label, poor recognition

2021-12-21 Thread Art Rhyno
One other idea that might help in a case like this is to use a threshold, using Imagemagick for example (though it adds some garbage): $ convert -threshold 20% sample.jpg sample.png $ tesseract --psm 11 sample.png sample $ more sample.txt +125 PROCock tai 2 12/03/2021 36729/21 3+4 | > Nb

[tesseract-ocr] Re: Microscopy label, poor recognition

2021-12-21 Thread Keith M
Martin, I'd normally reply privately here, but I don't think that's an option given google groups configuration. I know you didn't ask this specifically, but I ran your sample image, unmodified, through AWS Textract, and got great results. I'm happy to run a small subset of images through it