I have had good success with Tesseract for OCR (https://github.com/tesseract-ocr/tesseract).

If you don't fancy learning LaTeX, you could take a look at OOoLilyPond (https://ooolilypond.sourceforge.net/).

On 21/02/2026 11:35, Gabriel Ellsworth wrote:

Here is my situation.

 1. I am trying to typeset a new edition of a public-domain book.
 2. I have a PDF that contains a scanned copy of a 20thcentury
    printing of this book (about 700 pages).
 3. My output will contain a bit of LilyPond output, but music
    notation will not be “the main actor” (to borrow Lucas’s very apt
    phrase below). I estimate that the book will be 97% text and 3%
    LilyPond.
 4. Based on past helpful input from this list, I suspect that LaTeX
    will be the best way to create this book.
 5. I have never used LaTeX before.
 6. I know almost nothing about how OCR software or AI works on the
    back end.

My question:

Is there a good program or site out there that can take my existing PDF, “read” it, and help me transcribe it in (convert it to) LaTeX code?


The “3% music” portion of my output will be easy for me to code myself in LilyPond. But I’m hoping to save several hours of work coding the “97% text” component of this 700-page book.


Gabriel

--
Timothy Lanfear, Bristol, UK.

Reply via email to