Re: [tesseract-ocr] getting started, but no results

Yesbird Tue, 28 Dec 2021 14:23:52 -0800

I beleave that key is page segmentation mode - try to play with it:
https://www.pyimagesearch.com/2021/11/15/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy/



On Tuesday, December 28, 2021 at 11:45:54 PM UTC+3 thisism...@gmail.com 
wrote:

> Thanks, yes i had looked at that.
> I began by expanding the image by 5x to get the characters to about 50 
> pixels high (vs about 8 initially).
> My initial tests generated a tessinput.tif that looked very good to my 
> eye, but did not work for the OCR.
> I ended up also doing: 
> - posterize to level 2 to reduce the colors
> - dilate, to reduce the thickness of the characters
> But this still was not working.
>
> I suspect the single characters and the lines between are causing issue. 
> I had tried several of the many settings in the config file hoping to 
> figure out ones that would work but got nowhere and seemed to be shooting 
> in the dark.
> As i am unfamiliar with these many settings and did not find details on 
> the meaning of many of them, my question was hoping to find some ideas on 
> which ones might be helpful.
>
> In the end i have defined rectangles for the position of each character, 
> then copy all these rectangles to a new image placing characters in nice 
> rows.
> This worked on the sample image i have but i do not yet have additional 
> samples to see if it works on them.
> I had hoped to avoid coding the detail for the exact position of the 
> characters and that it might read them as is.
> Will see later when more samples arrive if this is a workable solution.
>
> On Tuesday, December 28, 2021 at 4:33:56 AM UTC-5 zdenop wrote:
>
>> Did you read the docs?
>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md
>>
>> Zdenko
>>
>>
>> ut 28. 12. 2021 o 10:28 michael c <thisism...@gmail.com> napísal(a):
>>
>>> I am just starting to use the tesseract package and having no luck 
>>> getting it to recognize anything.
>>> My environment is C# using the package from nuget.
>>> I am able to run fine, just no text is recognized in my sample image.
>>> It does work on the provided 'phototest.tif'.
>>> I have fiddled with many parameters in the config file and none has 
>>> resulted in any useful output.
>>> I only need to recognize digits and the image will have the same 
>>> consistent form as this one.
>>> Any hints on parameters i should look at to get this running?
>>>
>>> [image: 20211222_Capture_cut.PNG]
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/d12cad94-ad9c-4659-87bc-94a57b58a4e1n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/d12cad94-ad9c-4659-87bc-94a57b58a4e1n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a80e3318-f8a6-4c4e-b6d7-4401174825b1n%40googlegroups.com.

Re: [tesseract-ocr] getting started, but no results

Reply via email to