1. Provide also example files (input, output)
2. Tesseract does not accept pdf (it needs an image as input), so at least
3. seems to be a problem of OCRmyPDF.

Provide also the output of "tesseract --version" command

Zdenko


po 3. 7. 2023 o 21:24 Filippos Koliopanos <kro1...@gmail.com> napĂ­sal(a):

>
> Hello,
>
> I have been trying to make PDFs searchable using OCRmyPDF and Tesseract,
> but despite following recommended steps, I have been unable to get the
> desired results.
>
> Here is a summary of the issues I have faced:
>
> 1. Initially, I tried running OCRmyPDF on a PDF document (created by
> exporting a PNG image to PDF via GIMP) using the command `ocrmypdf -l eng
> OCR_test_eng.pdf outputOCR.pdf`. The process completed without errors, but
> the output PDF was not searchable.
>
> 2. I then updated my Tesseract to version
> 5.3.1+git6228-24da4c71-1ppa1~jammy1, hoping it might resolve the problem.
> However, the issue persisted.
>
> 3. I also attempted using the `--force-ocr` option with OCRmyPDF, but the
> output PDF remained unsearchable. Interestingly, for a scanned PDF
> document, OCRmyPDF indicated that the document already had text, even
> though it was not searchable.
>
> 4. To rule out problems with OCRmyPDF, I tried using pdfsandwich for OCR.
> However, it reported that Tesseract was unable to produce a PDF output
> file, suggesting that the problem might be with Tesseract itself.
>
> 5. I am running these commands on a Linux system Ubuntu 22.04.2 LTS
>
> I have had no success with previous attempts at using Tesseract for OCR on
> Linux, and I'm hoping to finally resolve this issue. Any guidance would be
> greatly appreciated.
>
> Best,
> Filippos
> ---
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/f84fa2a7-85be-46b8-bbf8-2d7ab605e324n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/f84fa2a7-85be-46b8-bbf8-2d7ab605e324n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yF%2BYq3339vmoTT2D6-v95Nh75%2BMYVLEwiajeZiXne4HA%40mail.gmail.com.

Reply via email to