Re: [tesseract-ocr] Any success story?

2023-11-14 Thread Keith Smith
The short answer is "no", but a fuller answer is that my use case is a bit different from others and is as follows ... I trained tesseract to read the MICR line at the bottom of bank checks using only 20K checks (i.e. real data, not synthetic). I was able to get 85% accuracy where the reason for

Re: [tesseract-ocr] Any success story?

2023-11-14 Thread Merlijn B.W. Wajer
Hi, On 14/11/2023 06:55, Des Bw wrote: It looks like every one is having issues with tesseract. I am not able to find any one who has a great success with this software. It would be really encouraging to hear any success story from any language. Here's one for you: https://blog.archive.org/2

[tesseract-ocr] tesseract-ocr is not converting or extracting the text properly

2023-11-14 Thread Arul Britto Kumar Abraham
Hi, I am using tesseract-ocr in my python code to convert non-searchable pdf to searchable pdf document, it is not converting fully... I am using "poppler-23.08.0" to convert the PDF page to images from this image I am using "pytesseract.image_to_pdf_or_hocr" method to convert to PDF files a