Thanks that is a really helpful link. Unfortunately I do not have much chance of getting better documents. The second scan came from a helpful archivist at an installation that requires a classification to enter. Otherwise I would literally get on a plane and go and look myself. I was gratified that they were as helpful as they were. Really the halting point in this translation is not the human words. It is the jump vectors ( the goto statements ) and so now I am back to seeing if I can figure out some sort of relationship in the jump vectors in the left hand column. Unfortunately they do not match the line numbers on the right hand side. But maybe I have just not figured out what that relationship might be. Basically back to searching for context. Some other things in my favour are that the thesis itself is an excellent piece of work really well explained and has what are basically unit tests included that are themselves quite legible. I feel getting this code back is right on the edge of possibility if I just think about it a bit more.
On Tuesday, February 11, 2025 at 5:33:36 AM UTC+11 gt...@gtoal.com wrote: > I can't help with tesseract advice - when I wanted to do the same thing I > found it easier to write a custom OCR for this specific problem from > scratch. It's very much an experiment and a work-in-progress (although > I've not worked on it for about a year I'm afraid) but you might find > something helpful from the discussion or the code: > https://retrocomputingforum.com/t/custom-ocr-for-printer-listings/4016 > and http://gtoal.com/src/OCR/ > > However you *will* need to do better scans using a flatbed scanner if you > still have access to the originals. Those scans are unusable - the pages in > the recent one had not been laid flat - it looks like they were taken with > an overhead camera.. > > Graham > > On Mon, Feb 10, 2025 at 4:29 AM Mixotricha <connoll...@gmail.com> wrote: > >> >> I have a question about using Tesseract for trying to recover some source >> code of a printed listing that most likely would have come off a line >> printer in the early 70's probably scanned in by photocopier and them more >> recently by a more modern digital scanner. >> >> I have two copies of the document. One the original scan and another that >> was recently scanned for me by the archive area of the University that >> houses the document. Unfortunately both have different problems! >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/1ec5d690-65a5-41bf-bc79-38acd427fe2bn%40googlegroups.com.