[tesseract-ocr] Post OCR Verification and Editing

Mark Pellegrino Thu, 07 Mar 2024 11:17:25 -0800

Hello,
I'm trying to check PDFs made with Tesseract 5.2 for correctness using an 
OCR editor but am unable to open them in either Abbyy or Acrobat.


If I try to open a Tesseract PDF with Abbyy FineReader/OCR Editor, the 
software just hangs and crashes. I can open Tesseract PDFs with Acrobat 
Pro, but when I enable the  'Make OCR text visible' option in Preflight, 
all of the text layer turns into unreadable black boxes. The font used 
shows as 'GlyphLessFont' and appears to be embedded in the file.

It doesn't matter what training data I use, or what the source image was, I 
always get these results. Any other non-Tesseract made PDF works just fine. 
I'm guessing that the issue is a missing font? I don't have much of an 
understanding about how embedded PDF fonts work and I haven't found 
anything about this in the Tesseract docs. Can someone please point me in 
the right direction? I Thanks.


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/262ae3ed-bb5b-4685-a17b-29bfb9f9087en%40googlegroups.com.

[tesseract-ocr] Post OCR Verification and Editing

Reply via email to