On Tue, Feb 11, 2025 at 5:52 PM Mixotricha <connolly.dam...@gmail.com> wrote:
> Thanks that is a really helpful link. Unfortunately I do not have much > chance of getting better documents. The second scan came from a helpful > archivist at an installation that requires a classification to enter. > Otherwise I would literally get on a plane and go and look myself. I was > gratified that they were as helpful as they were. Really the halting point > in this translation is not the human words. It is the jump vectors ( the > goto statements ) and so now I am back to seeing if I can figure out some > sort of relationship in the jump vectors in the left hand column. > Unfortunately they do not match the line numbers on the right hand side. > But maybe I have just not figured out what that relationship might be. > Basically back to searching for context. Some other things in my favour are > that the thesis itself is an excellent piece of work really well explained > and has what are basically unit tests included that are themselves quite > legible. I feel getting this code back is right on the edge of possibility > if I just think about it a bit more. > I sympathise on the access problem - we submitted a bunch of listings and docs to our local museum for safe keeping and haven't seen it since. I guess they're being kept very safe :-/ But don't give up hope on getting better access. I was quite impressed that the folks working on restoring the Bloodhound at bmpg.org.uk were able to get access to the original Coral66 source code. I myself managed to get the MOD's Defence Procurement Agency to give me permission to post the Coral 66 manual, just by asking via the contact page at the HMSO. So you never know... sometimes these people can be surprisingly reasonable. So my fixed-pitch stuff isn't going to help you. I have two other suggestions: 1) classic re-keying by 2 or 3 independent people. (if 2, then someone has to go over the differences and explicitly make a selection; if 3, use a 2 out of 3 consensus to pick the preferred version. Neither is foolproof but does considerably lower the rate of errors.); and 2) there's some experimental dewarping software worth trying such as https://mzucker.github.io/2016/08/15/page-dewarping.html which might be better than the sort of sortware used in things like CZUR scanners that have a very specific model of a V shaped spine between pages of a book. Looking at your hand-tidied source I would expect that a custom fortran parser could find a lot of corrections, simply by keeping a name and frequency table of variables - to catch things like CCMREG vs COMREG for example and automatically suggesting the preferred version. I found that a hacked-up parser for Algol 60 was extremely helpful at that sort of correction, leaving only a few minor errors to catch using a real compiler once the sources were cleaned up enough to be compilable. Good luck with your project. G -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CABwQhLmeDOx9fAuVqPXkwroqVEvi0HZAMjcDOtBxj6b5n_7taQ%40mail.gmail.com.