[tesseract-ocr] Re: Process HOCR Content to generate Docx | Programmaticaly

Suresh Kumar Mon, 22 Feb 2021 12:29:39 -0800

Can anyone please help me on this.

On Sunday, February 21, 2021 at 1:52:57 AM UTC-5 Suresh Kumar wrote:


> Team,
>
> Currently i'm trying to process HOCR (XML parser) content to Docx(docx4j) 
> in Java, for generating Docx file. 
>
> is there any document, how i can process the HOCR data and transform into 
> Docx?
>
> Note: i'm looking to get *bbox* info of each *ocr_line* and trying to 
> position the words  in docx.
>
> i noticed this conversation, i want programmatic way of processing, so 
> that i can process all OCR data effectively and generate formatted way of 
> docx fie. 
>
> https://groups.google.com/g/tesseract-ocr/c/tEsQFxct2DI/m/nJYzXTpLAQAJ
>
> Thanks,
> Suresh Kumar M
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6f7a0d63-fc12-4f18-97e0-efbfe6ad6906n%40googlegroups.com.

[tesseract-ocr] Re: Process HOCR Content to generate Docx | Programmaticaly

Reply via email to