Re: [CODE4LIB] PDF with OCR from different source

Rasan Rasch Fri, 08 May 2020 13:09:26 -0700

Hi Kim,

One solution would be to use the pdfimages utility from Poppler to
extract all the images from the PDF into a directory.  You would then
place the corresponding hocr files in the same directory and then
run the hocr-pdf utility from hocr-tools.


Both software packages are readily available on many Linux systems.

https://poppler.freedesktop.org/
https://github.com/tmbdev/hocr-tools

Thanks,
Rasan
NYU Digital Library


On Wed, May 6, 2020 at 2:42 PM Kimberly Kennedy <[email protected]>
wrote:

> I have an unusual situation. I've created a PDF that I want to be text
> searchable. However, I would like to use OCR data from a different source
> than that document. Is it possible to add a text file as the OCR layer to
> an existing PDF?
>
> Any ideas would be appreciated!
>
> Thanks,
>
> Kim
>
>
> Kimberly Kennedy
> Digital Production Coordinator
> Northeastern University Library
> [email protected]
>

Re: [CODE4LIB] PDF with OCR from different source

Reply via email to