[tesseract-ocr] Process HOCR Content to generate Docx | Programmaticaly

Suresh Kumar Sat, 20 Feb 2021 22:52:56 -0800

Team,

Currently i'm trying to process HOCR (XML parser) content to Docx(docx4j) 
in Java, for generating Docx file.


is there any document, how i can process the HOCR data and transform into 
Docx?

Note: i'm looking to get *bbox* info of each *ocr_line* and trying to 
position the words  in docx.

i noticed this conversation, i want programmatic way of processing, so that 
i can process all OCR data effectively and generate formatted way of docx 
fie. 

https://groups.google.com/g/tesseract-ocr/c/tEsQFxct2DI/m/nJYzXTpLAQAJ

Thanks,
Suresh Kumar M

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/58a681be-d540-4643-9492-b065436b9988n%40googlegroups.com.

[tesseract-ocr] Process HOCR Content to generate Docx | Programmaticaly

Reply via email to