Team, Currently i'm trying to process HOCR (XML parser) content to Docx(docx4j) in Java, for generating Docx file.
is there any document, how i can process the HOCR data and transform into Docx? Note: i'm looking to get *bbox* info of each *ocr_line* and trying to position the words in docx. i noticed this conversation, i want programmatic way of processing, so that i can process all OCR data effectively and generate formatted way of docx fie. https://groups.google.com/g/tesseract-ocr/c/tEsQFxct2DI/m/nJYzXTpLAQAJ Thanks, Suresh Kumar M -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/58a681be-d540-4643-9492-b065436b9988n%40googlegroups.com.