Hi Mark,

On 08/03/2024 20:24, Mark Pellegrino wrote:
Thank you Merlijn, this is very helpful. I'm very interested in IA's process so I'll have a deep dive through those tools.  This confirms my suspicions that there's no way to use an off-the-shelf text editor with a glyphless font. I'll explore these hOCR editor options. All the best,

As I understand it the main reason that there is no 'editor' for PDFs with text is that the text in PDFs in inherently not structured in a hierarchical manner, so by going from hOCR (or another format) -> PDF text you lose a lot of structure. Even the PDF text reading order might differ per PDF renderer - it's just text rendered in a coordinate space, so it's not a particular good fit for 'editing'.

Regards,
Merlijn

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0b80ac5b-3d25-4d54-9868-8e6ebac97b0b%40archive.org.

Reply via email to