I have had good success with Tesseract for OCR
(https://github.com/tesseract-ocr/tesseract).
If you don't fancy learning LaTeX, you could take a look at OOoLilyPond
(https://ooolilypond.sourceforge.net/).
On 21/02/2026 11:35, Gabriel Ellsworth wrote:
Here is my situation.
1. I am trying to typeset a new edition of a public-domain book.
2. I have a PDF that contains a scanned copy of a 20thcentury
printing of this book (about 700 pages).
3. My output will contain a bit of LilyPond output, but music
notation will not be “the main actor” (to borrow Lucas’s very apt
phrase below). I estimate that the book will be 97% text and 3%
LilyPond.
4. Based on past helpful input from this list, I suspect that LaTeX
will be the best way to create this book.
5. I have never used LaTeX before.
6. I know almost nothing about how OCR software or AI works on the
back end.
My question:
Is there a good program or site out there that can take my existing
PDF, “read” it, and help me transcribe it in (convert it to) LaTeX code?
The “3% music” portion of my output will be easy for me to code myself
in LilyPond. But I’m hoping to save several hours of work coding the
“97% text” component of this 700-page book.
Gabriel
--
Timothy Lanfear, Bristol, UK.