I have an image (label of a microscopy slide), which I thought would be
easy to OCR, because it is easily readable for humans. I am using the
latest Tesseract V5 as a command line under Windows However, with
tesseract image.jpg image.txt --oem 1 --psm x
with "--psm x" x being any number, whi
Hi Martin,
Some of the advice below applies to Tesseract 5 only...
On 21/12/2021 09:38, 'Martin Weihrauch' via tesseract-ocr wrote:
>
>
> I have an image (label of a microscopy slide), which I thought would be
> easy to OCR, because it is easily readable for humans. I am using the
> latest T
Thank you so much for your efforts!
Merlijn Wajer schrieb am Dienstag, 21. Dezember 2021 um 11:53:44 UTC+1:
> Hi Martin,
>
> Some of the advice below applies to Tesseract 5 only...
>
> On 21/12/2021 09:38, 'Martin Weihrauch' via tesseract-ocr wrote:
> >
> >
> > I have an image (label of a micro
One other idea that might help in a case like this is to use a threshold, using
Imagemagick for example (though it adds some garbage):
$ convert -threshold 20% sample.jpg sample.png
$ tesseract --psm 11 sample.png sample
$ more sample.txt
+125
PROCock tai
2
12/03/2021
36729/21 3+4
|
>
Nb
Martin,
I'd normally reply privately here, but I don't think that's an option given
google groups configuration.
I know you didn't ask this specifically, but I ran your sample image,
unmodified, through AWS Textract, and got great results. I'm happy to run
a small subset of images through it
5 matches
Mail list logo