Re: [tesseract-ocr] Improve text extraction when some text is inverted

2021-07-02 Thread 'Chris' via tesseract-ocr
Thanks to both of you for replying. I'm using Charles Weld's NuGet package (https://github.com/charlesw/tesseract/) so at the moment I think I am stuck on version 4.1.1. I have to admit Tesseract is a bit of a black box to me, and short of setting a few variables I am not I am at a bit of a los

Re: [tesseract-ocr] Improve text extraction when some text is inverted

2021-07-02 Thread Zdenko Podobny
You provided no example, so just hint: have a look at the leptonica function pixAutoPhotoinvert[1], that should help in such cases. Function is available IMO from version 1.79.0 [1] https://github.com/DanBloomberg/leptonica/blob/5aaf1c187deeef7f47288c6b0833a07021940da7/src/pageseg.c#L2370-L2391 Z

Re: [tesseract-ocr] Improve text extraction when some text is inverted

2021-07-02 Thread Merlijn B.W. Wajer
Hi, On 01/07/2021 18:39, 'Chris' via tesseract-ocr wrote: > I am experimenting with Tesseract 4.1.1 using C# to extract text from black > and white or greyscale TIF images of semi structured forms that are 300 > dpi. > > The results are really promising except when some of the text is inverted