Re: Typing in lost code

2022-01-24 Thread Douglas Taylor via cctalk
I've tried to OCR old Fortran Code from DTIC pdf documents.  There were 2 big problems; 1. The copies are very poor to start with and all OCR attempts produced about 75% error rate. 2. Old Fortran code limited variable names to 6 characters so they were generally not descriptive of what they r

Re: Typing in lost code

2022-01-24 Thread Dennis Boone via cctalk
> That's true generally. Anything other than actual photographs > (continuous tone images) should NOT be run through JPEG because JPEG > is not intended for, and unfit for, anything else. Printouts, line > drawings, and anything else with crisp edges between dark and light > will be messed u

Re: Typing in lost code

2022-01-24 Thread Paul Koning via cctalk
> On Jan 24, 2022, at 5:57 PM, ben via cctalk wrote: > >> ... > Document source is also a problem. > You would want to keep scan it at the best data format, > not something in a lossey format. That's true generally. Anything other than actual photographs (continuous tone images) should NOT

Re: Typing in lost code

2022-01-24 Thread ben via cctalk
On 2022-01-23 12:47 p.m., Chuck Guzis via cctalk wrote: On 1/23/22 10:16, Paul Koning via cctalk wrote: Maybe. But OCR programs have had learning features for decades. I've spent quite a lot of time in FineReader learning mode. Material produced on a moderate-quality typewriter, like the C

Re: Typing in lost code

2022-01-23 Thread dwight via cctalk
dwight via cctalk Sent: Sunday, January 23, 2022 10:06 AM To: cctalk@classiccmp.org Subject: Re: Typing in lost code It is unlikely that no current day OCR will produce an error free listing. It is possible to train an AI to do this but it requires specific training. It must be on the specifi

Re: Typing in lost code

2022-01-23 Thread Chuck Guzis via cctalk
On 1/23/22 10:16, Paul Koning via cctalk wrote: > Maybe. But OCR programs have had learning features for decades. I've spent > quite a lot of time in FineReader learning mode. Material produced on a > moderate-quality typewriter, like the CDC 6600 wire lists on Bitsavers, can > be handled t

Re: Typing in lost code

2022-01-23 Thread Lars Brinkhoff via cctalk
Noel Chiappa wrote: > https://walden-family.com/impcode/imp-code.pdf > Someone's already done the specialist OCR to deal with faded program > listings. I tried to contact the author about converting some of the other IMP listings, but got no reply.

Re: Typing in lost code

2022-01-23 Thread Paul Koning via cctalk
> On Jan 23, 2022, at 12:09 PM, Gavin Scott wrote: > > On Sun, Jan 23, 2022 at 9:11 AM Paul Koning via cctalk > wrote: >> One consideration is the effort required to repair transcription errors. >> Those that produce syntax errors aren't such an issue; >> those that pass the assembler or co

Re: Typing in lost code

2022-01-23 Thread dwight via cctalk
ss. Dwight From: cctalk on behalf of Noel Chiappa via cctalk Sent: Sunday, January 23, 2022 9:31 AM To: cctalk@classiccmp.org Cc: j...@mercury.lcs.mit.edu Subject: Re: Typing in lost code > From: Gavin Scott > I think if I had a whole lot of old faded greenbar

Re: Typing in lost code

2022-01-23 Thread Gavin Scott via cctalk
On Sun, Jan 23, 2022 at 11:31 AM Noel Chiappa via cctalk wrote: > See: > > https://walden-family.com/impcode/imp-code.pdf > > Someone's already done the specialist OCR to deal with faded program listings. Neat. Though all the complex character recognition part of that work is now like 15-20 lin

Re: Typing in lost code

2022-01-23 Thread Noel Chiappa via cctalk
> From: Gavin Scott > I think if I had a whole lot of old faded greenbar etc. ... Someone may > even have done this already See: https://walden-family.com/impcode/imp-code.pdf Someone's already done the specialist OCR to deal with faded program listings. Noel

Re: Typing in lost code

2022-01-23 Thread Gavin Scott via cctalk
On Sun, Jan 23, 2022 at 9:11 AM Paul Koning via cctalk wrote: > One consideration is the effort required to repair transcription errors. > Those that produce syntax errors aren't such an issue; > those that pass the assembler or compiler but result in bugs (say, a mistyped > register number) ar

Re: Typing in lost code

2022-01-23 Thread Jonathan Chapman via cctalk
I recently dealt with this with the DaJen SCI monitor listing out of the manual. The copy is pretty bad, and either their printer was having issues, or slashing of "zero" vs "O" was inconsistent somehow. OCRing it produced more of a mess than just sitting with the original and a text editor open

Re: Typing in lost code

2022-01-23 Thread Paul Koning via cctalk
I've run into that situation too, with listings so difficult that even a commercial OCR program (FineReader) couldn't handle it. At the time Tesseract was far less capable, though I haven't tried it recently to see if that has changed. Anyway, my experience was that the task was hard enough th

Re: Typing in lost code

2022-01-23 Thread Mark Kahrs via cctalk
No, OCR totally fails on olde line printer listing. At least the ones I've tried (tesseract, online, ...) On Sat, Jan 22, 2022 at 8:06 PM Ethan O'Toole wrote: > > Can the listings be OCR'ed? > > - Ethan > > > > Has anyone ever used Amazon Mechanical Turk to employ typi

Re: Typing in lost code

2022-01-22 Thread Ethan O'Toole via cctalk
Can the listings be OCR'ed? - Ethan Has anyone ever used Amazon Mechanical Turk to employ typists to type in old listings of lost code? Asking for a friend. -- : Ethan O'Toole

Typing in lost code

2022-01-22 Thread Mark Kahrs via cctalk
Has anyone ever used Amazon Mechanical Turk to employ typists to type in old listings of lost code? Asking for a friend.