Can you give me some example code? I'm currently trying to get tesseract working for C++ in Visual Studio and it's a bit of a nightmare. python seems easier though it's not one of my main languages - I can try it out though!
Iain On Saturday, July 13, 2024 at 11:20:54 AM UTC+1 [email protected] wrote: > Hi, > I try your example with tesseract for python - it works well > > Le jeu. 11 juil. 2024 à 20:35, Iain Downs <[email protected]> a écrit : > >> I'm trying to extract page numbers from scanned pages of text. Page >> Numbers are either at the top or at the bottom - sometimes with titles / >> authors / chapters. Occasionally elsewhere, but I don't care about the >> exceptions. >> >> I've loaded tesseract 5.4 (windows) and run some tests using the >> executable. I'm finding that if the page number is a single digit on the >> line, tesseract ignores it (but otherwise does a fantastic job of OCR even >> with skewed and noisy images). >> >> I've isolated the single line used that as input and tesseract tells me >> 'the page is empty'. >> >> Here is a sample of a single line with a '1' in it resolution is 300dpi. >> [image: 101_bottom.jpg] >> >> Ultimately I would be writing a program using tesseract, but in the first >> instance I'd like to see it work with the exe. >> >> So, can I tell tesseract to be less fussy with individual characters and >> if not how would I do so programatically - if possible? >> >> Thanks >> >> Iain >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/c42d435c-4db5-48b5-94d3-5b761d340731n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/c42d435c-4db5-48b5-94d3-5b761d340731n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2e56b599-4dcf-4b93-8e1b-40a57b36d3e9n%40googlegroups.com.

