Thanks to both of you for replying. I'm using Charles Weld's NuGet package 
(https://github.com/charlesw/tesseract/) so at the moment I think I am 
stuck on version 4.1.1. I have to admit Tesseract is a bit of a black box 
to me, and short of setting a few variables I am not I am at a bit of a 
loss in its use.

I'm not sure if I have access to calling Leptonica, and am unsure if my 
questions are better directed here or to Charles Weld.

Having looked at the pixAutoPhotoinvert code I could try and implement 
something similar in C# prior to handing the image to Tesseract. Thanks for 
that. Worst case I cause get Tesseract to look at the original image and an 
inverted image and then combine the results. Whilst simpler, that would 
double the time taken.

If it helps I could provide a sample C# project next week.

Chris
On Friday, 2 July 2021 at 11:56:26 UTC+1 zdenop wrote:

> You provided no example, so just hint: have a look at the leptonica 
> function pixAutoPhotoinvert[1], that should help in such cases. Function is 
> available IMO from version 1.79.0
>
> [1] 
> https://github.com/DanBloomberg/leptonica/blob/5aaf1c187deeef7f47288c6b0833a07021940da7/src/pageseg.c#L2370-L2391
>
> Zdenko
>
>
> pi 2. 7. 2021 o 8:11 'Chris' via tesseract-ocr <tesser...@googlegroups.com> 
> napĂ­sal(a):
>
>> I am experimenting with Tesseract 4.1.1 using C# to extract text from 
>> black and white or greyscale TIF images of semi structured forms that are 
>> 300 dpi. 
>>
>> The results are really promising except when some of the text is inverted 
>> (ie white on black). In these cases the results are poor. Can anyone 
>> suggest ways tackle this? All the discussions I have seen are for when the 
>> whole image is inverted, but here it is only some of the text?
>>
>> Regards,
>>
>> Chris
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/9681ae4f-f443-4a92-b1f4-e2a8919981a9n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/9681ae4f-f443-4a92-b1f4-e2a8919981a9n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/734d3075-3f1a-4b23-89a4-39addb5310f5n%40googlegroups.com.

Reply via email to