I can't help with tesseract advice - when I wanted to do the same thing I found it easier to write a custom OCR for this specific problem from scratch. It's very much an experiment and a work-in-progress (although I've not worked on it for about a year I'm afraid) but you might find something helpful from the discussion or the code: https://retrocomputingforum.com/t/custom-ocr-for-printer-listings/4016 and http://gtoal.com/src/OCR/
However you *will* need to do better scans using a flatbed scanner if you still have access to the originals. Those scans are unusable - the pages in the recent one had not been laid flat - it looks like they were taken with an overhead camera.. Graham On Mon, Feb 10, 2025 at 4:29 AM Mixotricha <connolly.dam...@gmail.com> wrote: > > I have a question about using Tesseract for trying to recover some source > code of a printed listing that most likely would have come off a line > printer in the early 70's probably scanned in by photocopier and them more > recently by a more modern digital scanner. > > I have two copies of the document. One the original scan and another that > was recently scanned for me by the archive area of the University that > houses the document. Unfortunately both have different problems! > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CABwQhLki6huDtaGDGgSi_rwySEkFBwSzKGQLdV3iBAoHwjLJSw%40mail.gmail.com.