Many thanks to George Chriss! (see above) My workaround based on his description: Modify the created hocr by XSLT (see below). Then using hocr2pdf 0.8.9 - and the textboxes are placed (almost) correctly.
$ tesseract image.tif ocr_file hocr $ xsltproc -html -nonet -novalid -o ocr_fixed.hocr fix-hocr.xsl ocr_file.hocr $ hocr2pdf -i image.tif -o searchable.pdf <ocr_fixed.hocr See attached file fix-hocr.xsl. ** Attachment added: "use on hocr file to fix for hocr2pdf 0.8.9 textbox placement" https://bugs.launchpad.net/cuneiform-linux/+bug/623438/+attachment/4432658/+files/fix-hocr.xsl -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/623438 Title: Font size not correct in merged sandvich PDF To manage notifications about this bug go to: https://bugs.launchpad.net/cuneiform-linux/+bug/623438/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs