Re: OCR to Transcribe Text PDF in LaTeX

Timothy Lanfear Sun, 22 Feb 2026 09:53:22 -0800

I have had good success with Tesseract for OCR(https://github.com/tesseract-ocr/tesseract).

If you don't fancy learning LaTeX, you could take a look at OOoLilyPond(https://ooolilypond.sourceforge.net/).


On 21/02/2026 11:35, Gabriel Ellsworth wrote:

Here is my situation.

 1. I am trying to typeset a new edition of a public-domain book.
 2. I have a PDF that contains a scanned copy of a 20thcentury
    printing of this book (about 700 pages).
 3. My output will contain a bit of LilyPond output, but music
    notation will not be “the main actor” (to borrow Lucas’s very apt
    phrase below). I estimate that the book will be 97% text and 3%
    LilyPond.
 4. Based on past helpful input from this list, I suspect that LaTeX
    will be the best way to create this book.
 5. I have never used LaTeX before.
 6. I know almost nothing about how OCR software or AI works on the
    back end.

My question:
Is there a good program or site out there that can take my existingPDF, “read” it, and help me transcribe it in (convert it to) LaTeX code?
The “3% music” portion of my output will be easy for me to code myselfin LilyPond. But I’m hoping to save several hours of work coding the“97% text” component of this 700-page book.
Gabriel

--
Timothy Lanfear, Bristol, UK.

Re: OCR to Transcribe Text PDF in LaTeX

Reply via email to