Thanks that is a really helpful link. Unfortunately I do not have much 
chance of getting better documents. The second scan came from a helpful 
archivist at an installation that requires a classification to enter. 
Otherwise I would literally get on a plane and go and look myself. I was 
gratified that they were as helpful as they were. Really the halting point 
in this translation is not the human words. It is the jump vectors ( the 
goto statements ) and so now I am back to seeing if I can figure out some 
sort of relationship in the jump vectors in the left hand column. 
Unfortunately they do not match the line numbers on the right hand side. 
But maybe I have just not figured out what that relationship might be. 
Basically back to searching for context. Some other things in my favour are 
that the thesis itself is an excellent piece of work really well explained 
and has what are basically unit tests included that are themselves quite 
legible. I feel getting this code back is right on the edge of possibility 
if I just think about it a bit more. 

On Tuesday, February 11, 2025 at 5:33:36 AM UTC+11 gt...@gtoal.com wrote:

> I can't help with tesseract advice - when I wanted to do the same thing I 
> found it easier to write a custom OCR for this specific problem from 
> scratch.  It's very much an experiment and a work-in-progress (although 
> I've not worked on it for about a year I'm afraid) but you might find 
> something helpful from the discussion or the code: 
> https://retrocomputingforum.com/t/custom-ocr-for-printer-listings/4016 
> and http://gtoal.com/src/OCR/
>
> However you *will* need to do better scans using a flatbed scanner if you 
> still have access to the originals. Those scans are unusable - the pages in 
> the recent one had not been laid flat - it looks like they were taken with 
> an overhead camera..
>
> Graham
>
> On Mon, Feb 10, 2025 at 4:29 AM Mixotricha <connoll...@gmail.com> wrote:
>
>>
>> I have a question about using Tesseract for trying to recover some source 
>> code of a printed listing that most likely would have come off a line 
>> printer in the early 70's probably scanned in by photocopier and them more 
>> recently by a more modern digital scanner. 
>>
>> I have two copies of the document. One the original scan and another that 
>> was recently scanned for me by the archive area of the University that 
>> houses the document. Unfortunately both have different problems!
>>
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/1ec5d690-65a5-41bf-bc79-38acd427fe2bn%40googlegroups.com.

Reply via email to