Thanks to both of you for replying. I'm using Charles Weld's NuGet package (https://github.com/charlesw/tesseract/) so at the moment I think I am stuck on version 4.1.1. I have to admit Tesseract is a bit of a black box to me, and short of setting a few variables I am not I am at a bit of a loss in its use.
I'm not sure if I have access to calling Leptonica, and am unsure if my questions are better directed here or to Charles Weld. Having looked at the pixAutoPhotoinvert code I could try and implement something similar in C# prior to handing the image to Tesseract. Thanks for that. Worst case I cause get Tesseract to look at the original image and an inverted image and then combine the results. Whilst simpler, that would double the time taken. If it helps I could provide a sample C# project next week. Chris On Friday, 2 July 2021 at 11:56:26 UTC+1 zdenop wrote: > You provided no example, so just hint: have a look at the leptonica > function pixAutoPhotoinvert[1], that should help in such cases. Function is > available IMO from version 1.79.0 > > [1] > https://github.com/DanBloomberg/leptonica/blob/5aaf1c187deeef7f47288c6b0833a07021940da7/src/pageseg.c#L2370-L2391 > > Zdenko > > > pi 2. 7. 2021 o 8:11 'Chris' via tesseract-ocr <tesser...@googlegroups.com> > napĂsal(a): > >> I am experimenting with Tesseract 4.1.1 using C# to extract text from >> black and white or greyscale TIF images of semi structured forms that are >> 300 dpi. >> >> The results are really promising except when some of the text is inverted >> (ie white on black). In these cases the results are poor. Can anyone >> suggest ways tackle this? All the discussions I have seen are for when the >> whole image is inverted, but here it is only some of the text? >> >> Regards, >> >> Chris >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/9681ae4f-f443-4a92-b1f4-e2a8919981a9n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/9681ae4f-f443-4a92-b1f4-e2a8919981a9n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/734d3075-3f1a-4b23-89a4-39addb5310f5n%40googlegroups.com.