I've tried to OCR old Fortran Code from DTIC pdf documents. There were
2 big problems;
1. The copies are very poor to start with and all OCR attempts produced
about 75% error rate.
2. Old Fortran code limited variable names to 6 characters so they were
generally not descriptive of what they r
> That's true generally. Anything other than actual photographs
> (continuous tone images) should NOT be run through JPEG because JPEG
> is not intended for, and unfit for, anything else. Printouts, line
> drawings, and anything else with crisp edges between dark and light
> will be messed u
> On Jan 24, 2022, at 5:57 PM, ben via cctalk wrote:
>
>> ...
> Document source is also a problem.
> You would want to keep scan it at the best data format,
> not something in a lossey format.
That's true generally. Anything other than actual photographs (continuous tone
images) should NOT
On 2022-01-23 12:47 p.m., Chuck Guzis via cctalk wrote:
On 1/23/22 10:16, Paul Koning via cctalk wrote:
Maybe. But OCR programs have had learning features for decades. I've spent quite a lot of time in
FineReader learning mode. Material produced on a moderate-quality typewriter, like the C
dwight via cctalk
Sent: Sunday, January 23, 2022 10:06 AM
To: cctalk@classiccmp.org
Subject: Re: Typing in lost code
It is unlikely that no current day OCR will produce an error free listing.
It is possible to train an AI to do this but it requires specific training. It
must be on the specifi
On 1/23/22 10:16, Paul Koning via cctalk wrote:
> Maybe. But OCR programs have had learning features for decades. I've spent
> quite a lot of time in FineReader learning mode. Material produced on a
> moderate-quality typewriter, like the CDC 6600 wire lists on Bitsavers, can
> be handled t
Noel Chiappa wrote:
> https://walden-family.com/impcode/imp-code.pdf
> Someone's already done the specialist OCR to deal with faded program
> listings.
I tried to contact the author about converting some of the other IMP
listings, but got no reply.
> On Jan 23, 2022, at 12:09 PM, Gavin Scott wrote:
>
> On Sun, Jan 23, 2022 at 9:11 AM Paul Koning via cctalk
> wrote:
>> One consideration is the effort required to repair transcription errors.
>> Those that produce syntax errors aren't such an issue;
>> those that pass the assembler or co
ss.
Dwight
From: cctalk on behalf of Noel Chiappa via
cctalk
Sent: Sunday, January 23, 2022 9:31 AM
To: cctalk@classiccmp.org
Cc: j...@mercury.lcs.mit.edu
Subject: Re: Typing in lost code
> From: Gavin Scott
> I think if I had a whole lot of old faded greenbar
On Sun, Jan 23, 2022 at 11:31 AM Noel Chiappa via cctalk
wrote:
> See:
>
> https://walden-family.com/impcode/imp-code.pdf
>
> Someone's already done the specialist OCR to deal with faded program listings.
Neat. Though all the complex character recognition part of that work
is now like 15-20 lin
> From: Gavin Scott
> I think if I had a whole lot of old faded greenbar etc. ... Someone may
> even have done this already
See:
https://walden-family.com/impcode/imp-code.pdf
Someone's already done the specialist OCR to deal with faded program listings.
Noel
On Sun, Jan 23, 2022 at 9:11 AM Paul Koning via cctalk
wrote:
> One consideration is the effort required to repair transcription errors.
> Those that produce syntax errors aren't such an issue;
> those that pass the assembler or compiler but result in bugs (say, a mistyped
> register number) ar
I recently dealt with this with the DaJen SCI monitor listing out of the
manual. The copy is pretty bad, and either their printer was having issues, or
slashing of "zero" vs "O" was inconsistent somehow. OCRing it produced more of
a mess than just sitting with the original and a text editor open
I've run into that situation too, with listings so difficult that even a
commercial OCR program (FineReader) couldn't handle it. At the time Tesseract
was far less capable, though I haven't tried it recently to see if that has
changed.
Anyway, my experience was that the task was hard enough th
No, OCR totally fails on olde line printer listing. At least the ones I've
tried (tesseract, online, ...)
On Sat, Jan 22, 2022 at 8:06 PM Ethan O'Toole wrote:
>
> Can the listings be OCR'ed?
>
> - Ethan
>
>
> > Has anyone ever used Amazon Mechanical Turk to employ typi
Can the listings be OCR'ed?
- Ethan
Has anyone ever used Amazon Mechanical Turk to employ typists to type in
old listings of lost code?
Asking for a friend.
--
: Ethan O'Toole
Has anyone ever used Amazon Mechanical Turk to employ typists to type in
old listings of lost code?
Asking for a friend.
17 matches
Mail list logo