> Bizarrely, this simple code finds the page numbers at the bottom of the 
page perfectly happily, whereas the tesseract executable did not.  This is 
good news - though confusing...

IIRC, the library has the psm default set to PSM_SINGLE_BLOCK = 6, while 
tesseract CLI sets psm to PSM_AUTO = 3 when you don't specify it 
explicitly, hence two different psm 'defaults', which may well explain the 
discrepancy you observe.


On Sunday, July 14, 2024 at 9:20:59 AM UTC+2 ia...@idcl.co.uk wrote:

> I have FINALLY got the c++ samples working in Visual Studio 2022. The code 
> I am using is the first tesseract sample code from here 
> <https://tesseract-ocr.github.io/tessdoc/Examples_C++.html> .
>
> Bizarrely, this simple code finds the page numbers at the bottom of the 
> page perfectly happily, whereas the tesseract executable did not.  This is 
> good news - though confusing...
>
> Thanks to all for your input on this - I think for the moment I'm enough 
> ahead that I can call this issue closed.  I will be seeing if I can 
> replicate this in c# which is a more productive environment for me than C++.
>
> Iain
>
>
> On Sunday, July 14, 2024 at 7:47:47 AM UTC+1 Iain Downs wrote:
>
>> Apologies.  Python file in the google groups but for some reason didn’t 
>> come down with the email.
>>
>>  
>>
>> Also, I now have a sample program (nearly) working in C++.  My last step 
>> was to copy all the dlls from the vcpkg install into the source directory, 
>> otherwise they weren’t found when running.  I’m left with setting the 
>> location of the language file and it should work.  But the python will be 
>> helpful nonetheless.
>>
>>  
>>
>> Iain
>>
>>  
>>
>> *From:* tesser...@googlegroups.com [mailto:tesser...@googlegroups.com] *On 
>> Behalf Of *Dominic Mukilan
>> *Sent:* 13 July 2024 17:42
>> *To:* tesser...@googlegroups.com
>> *Subject:* Re: [tesseract-ocr] Tessarct won't recognise single characters
>>
>>  
>>
>> Attaching the python file, the supporting files, and requirements.txt
>>
>>  
>>
>> On Sat, 13 Jul 2024 at 21:56, Iain Downs <ia...@idcl.co.uk> wrote:
>>
>> Can you give me some example code?  I'm currently trying to get tesseract 
>> working for C++ in Visual Studio and it's a bit of a nightmare.  python 
>> seems easier though it's not one of my main languages - I can try it out 
>> though!
>>
>>  
>>
>> Iain
>>
>> On Saturday, July 13, 2024 at 11:20:54 AM UTC+1 renec...@gmail.com wrote:
>>
>> Hi,
>>
>> I try your example with tesseract for python - it works well
>>
>>  
>>
>> Le jeu. 11 juil. 2024 à 20:35, Iain Downs <ia...@idcl.co.uk> a écrit :
>>
>> I'm trying to extract page numbers from scanned pages of text.  Page 
>> Numbers are either at the top or at the bottom - sometimes with titles / 
>> authors / chapters.  Occasionally elsewhere, but I don't care about the 
>> exceptions.
>>
>>  
>>
>> I've loaded tesseract 5.4 (windows) and run some tests using the 
>> executable.  I'm finding that if the page number is a single digit on the 
>> line, tesseract ignores it (but otherwise does a fantastic job of OCR even 
>> with skewed and noisy images).
>>
>>  
>>
>> I've isolated the single line used that as input and tesseract tells me 
>> 'the page is empty'.
>>
>>  
>>
>> Here is a sample of a single line with a '1' in it resolution is 300dpi.
>>
>> [image: Image removed by sender. 101_bottom.jpg]
>>
>>  
>>
>> Ultimately I would be writing a program using tesseract, but in the first 
>> instance I'd like to see it work with the exe.
>>
>>  
>>
>> So, can I tell tesseract to be less fussy with individual characters and 
>> if not how would I do so programatically - if possible?
>>
>>  
>>
>> Thanks
>>
>>  
>>
>> Iain
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/c42d435c-4db5-48b5-94d3-5b761d340731n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/c42d435c-4db5-48b5-94d3-5b761d340731n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2e56b599-4dcf-4b93-8e1b-40a57b36d3e9n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/2e56b599-4dcf-4b93-8e1b-40a57b36d3e9n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/tesseract-ocr/AI48y7_QMlg/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/CAOrS2tW_CUVUsOv%3DAXanD2947Q29xC8hO1z6kzXLciix8XHbJA%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAOrS2tW_CUVUsOv%3DAXanD2947Q29xC8hO1z6kzXLciix8XHbJA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/add62795-e1d2-47e9-a8b0-43bcc3f9832fn%40googlegroups.com.

Reply via email to