There's a open source tool named OCRmyPDF which claims to do what you're trying
to do: see https://github.com/fritz-hh/OCRmyPDF
As far as I understand, it makes use of standard GNU/Linux software and produces
a searchable pdf file (which implies in my understanding that the text is
extractable). I haven't used this tool. Maybe, the source code could give you
some hints.
-- 
Regards,
jvp.



-- 
To UNSUBSCRIBE, email to [email protected] 
with a subject of "unsubscribe". Trouble? Contact [email protected]
Archive: https://lists.debian.org/[email protected]

Reply via email to