[tesseract-ocr] Any way to stop ocr after set time period?

Ajg Thu, 03 Apr 2025 09:12:57 -0700

I have an OCR program that tries to read and interpret many documents of 
different composition.  Some documents are pdfs that have an image as the 
first page with text on the second (or later) pages.   When processing, it 
can take several minutes or more  just to get past the first page of the 
pdf on the GetText() call when it is an image with little or no text on 
it.  The application is .net based on Winforms. Pdf Pages with lots of text 
work fine.


The relevant code in c# is 
var ocr = new TesseractEngine(..."tessdata5.2",
                                           "eng",
                                           EngineMode.LstmOnly);
using var page = ocr.Process(img, PageSegMode.AutoOsd);
ocrtext = page.GetText();   /* long time here */

img img = PixConverter.ToPix(save_bitmap);

I do need to collect text from subsequent pages for indexing documents. 

Thanks in advance for any comments you may have.  

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/daff593f-01f3-4d09-acc4-a72ed39d4a98n%40googlegroups.com.

[tesseract-ocr] Any way to stop ocr after set time period?

Reply via email to