Hi folks,

I have a number of typescript / manuscript images on which it is quite time consuming to run OCR. (Or more accurately it is quite time consuming to correct the OCR).

For some of these I have text files containing accurate transcriptions. In other cases I have TEI files with these transcriptions.

What is a straightforward way to combine the text with overlaid images to create searchable pdfs?

I know my way around the command line and can follow tutorials but I'm not a programmer so the more straightforward the solution the better.

I have had a go with pdftkBuilder and a result can be seen here [https://www.dropbox.com/s/fxp6rnt24043aez/result3.pdf] but there are a number of problems:

1. it involves 'printing' the text to pdf and 'stamping' the image over it. The result entails a margin unless the image matches a standard paper size. 2. the underlying text doesn't match up to the image. I would love if it could but can live with it if can't. 3. it is very time consuming - ideally I would like a solution that could be scripted and left to run.

Any advice would be greatly appreciated.


The best I have

--

Padraic


Padraic Stack | Digital Humanities Support Officer | NUI Maynooth | 
padraic.st...@nuim.ie |Phone: Mon: 01 474 7187 Tue - Fri: 01 474 7197

Reply via email to