Re: [tesseract-ocr] Improve text extraction when some text is inverted

2021-07-02 Thread 'Chris' via tesseract-ocr
Thanks to both of you for replying. I'm using Charles Weld's NuGet package (https://github.com/charlesw/tesseract/) so at the moment I think I am stuck on version 4.1.1. I have to admit Tesseract is a bit of a black box to me, and short of setting a few variables I am not I am at a bit of a los

Re: [tesseract-ocr] Improve text extraction when some text is inverted

2021-07-02 Thread Zdenko Podobny
You provided no example, so just hint: have a look at the leptonica function pixAutoPhotoinvert[1], that should help in such cases. Function is available IMO from version 1.79.0 [1] https://github.com/DanBloomberg/leptonica/blob/5aaf1c187deeef7f47288c6b0833a07021940da7/src/pageseg.c#L2370-L2391 Z

Re: [tesseract-ocr] Improve text extraction when some text is inverted

2021-07-02 Thread Merlijn B.W. Wajer
Hi, On 01/07/2021 18:39, 'Chris' via tesseract-ocr wrote: > I am experimenting with Tesseract 4.1.1 using C# to extract text from black > and white or greyscale TIF images of semi structured forms that are 300 > dpi. > > The results are really promising except when some of the text is inverted

[tesseract-ocr] Improve text extraction when some text is inverted

2021-07-01 Thread 'Chris' via tesseract-ocr
I am experimenting with Tesseract 4.1.1 using C# to extract text from black and white or greyscale TIF images of semi structured forms that are 300 dpi. The results are really promising except when some of the text is inverted (ie white on black). In these cases the results are poor. Can anyon