On 2019-01-02 7:22 AM, Steve Malikoff via cctalk wrote:
> I timed myself how long it would take to clean up Mattis' supplied image so
> it might
> be able to be OCR'd more accurately. Using Paint.NET it took me 23 minutes to
> get to
> the following:
> http://web.aanet.com.au/~malikoff/pdp11/dvY9
I timed myself how long it would take to clean up Mattis' supplied image so it
might
be able to be OCR'd more accurately. Using Paint.NET it took me 23 minutes to
get to
the following:
http://web.aanet.com.au/~malikoff/pdp11/dvY973s_cleaned.png
There are still a few little bits I missed, but hap
The only way I've been able to get any type of readable ASCII TEXT
from the .tif's is to do the following for each tif:
convert -density 1200 -resize 40% xaaa.tif -density 1 xaaa120040.tif
Then, OCR it with Irfanview with the KADMOS Plugin Installed.
For the first Page I get the following ASCII:
y, 1 January 2019 12:18 PM
To: dwight ; General Discussion: On-Topic and Off-Topic
Posts
Subject: Re: OCR old software listing
> On Dec 31, 2018, at 7:13 PM, dwight via cctalk
wrote:
>
> Fred is right, OCR is only worth it if the document is in perfect
condition. I just finish getting a
have analysed what I could see
and make a judgement, based on what I could see and the general context as I
was typing it in.
Dwight
From: cctalk on behalf of Fred Cisin via cctalk
Sent: Monday, December 31, 2018 9:46 AM
To: General Discussion: On-Topic and Off-Topic Posts
Subject: Re: OCR old s
> On Dec 31, 2018, at 7:13 PM, dwight via cctalk wrote:
>
> Fred is right, OCR is only worth it if the document is in perfect condition.
> I just finish getting an old 4004 listing working. I made only two mistakes
> on the 4K of code that were not the fault of the poorness of the listing.
-Topic and Off-Topic Posts
Subject: Re: OCR old software listing
On Mon, 31 Dec 2018, Larry Kraemer via cctalk wrote:
> I used the libtiff-tools (Debian 8.x - 32 Bit) to extract all 61 .TIF's
> from the Multipage .tif file. While the .tif's look descent, and
> RasterVect shows th
On Mon, 31 Dec 2018, Larry Kraemer via cctalk wrote:
I used the libtiff-tools (Debian 8.x - 32 Bit) to extract all 61 .TIF's
from the Multipage .tif file. While the .tif's look descent, and
RasterVect shows the .tif properties to be Group 4 Fax (1bpp) with 5100
x 6600 pixels - 300 DPI, I can't
On 2018-12-31 7:20 AM, Larry Kraemer via cctalk wrote:
> I used the libtiff-tools (Debian 8.x - 32 Bit) to extract all 61 .TIF's
> from the
> Multipage .tif file. While the .tif's look descent, and RasterVect shows
> the
> .tif properties to be Group 4 Fax (1bpp) with 5100 x 6600 pixels - 300 DPI,
I used the libtiff-tools (Debian 8.x - 32 Bit) to extract all 61 .TIF's
from the
Multipage .tif file. While the .tif's look descent, and RasterVect shows
the
.tif properties to be Group 4 Fax (1bpp) with 5100 x 6600 pixels - 300 DPI,
I can't get tesseract 3.x, TextBridge Classic 2.0, or Irfanview
On 2018-12-29 1:32 AM, Toby Thain via cctalk wrote:
> On 2018-12-29 12:47 AM, Toby Thain via cctalk wrote:
>> On 2018-12-26 4:29 PM, Mattis Lind via cctalk wrote:
>>> Finally I got hold of the sources for the PDP-11 SPACE WAR that was
>>> submitted to DECUS by Bill Seiler.
>>>
>>> The format is sca
On 2018-12-29 12:47 AM, Toby Thain via cctalk wrote:
> On 2018-12-26 4:29 PM, Mattis Lind via cctalk wrote:
>> Finally I got hold of the sources for the PDP-11 SPACE WAR that was
>> submitted to DECUS by Bill Seiler.
>>
>> The format is scans of the PAL-11S listing output. It is easy to crop the
>>
On 2018-12-26 4:29 PM, Mattis Lind via cctalk wrote:
> Finally I got hold of the sources for the PDP-11 SPACE WAR that was
> submitted to DECUS by Bill Seiler.
>
> The format is scans of the PAL-11S listing output. It is easy to crop the
> image to only contain actual source. Then running OCR on i
> On Dec 26, 2018, at 10:30 PM, Jon Elson via cctalk
> wrote:
>
> On 12/26/2018 03:29 PM, Mattis Lind via cctalk wrote:
>>
>> A good way to remove the black lines?
>>
>>
>>
>> https://i.imgur.com/dvY973s.png
>>
>>
> Oh, boy! The printer was not properly aligned, so the lines actually o
On 12/26/2018 03:29 PM, Mattis Lind via cctalk wrote:
A good way to remove the black lines?
https://i.imgur.com/dvY973s.png
Oh, boy! The printer was not properly aligned, so the lines
actually overlay the dot-matrix printed text! This is going
to make OCR very difficult! I don't think
On Wed, Dec 26, 2018, 17:15 Chuck Guzis via cctalk
wrote:
> On 12/26/18 3:17 PM, Al Kossow via cctalk wrote:
> > On 12/26/18 2:55 PM, Steve Malikoff via cctalk wrote:
> >> Scan them all as-is, put them up and 'crowd source' this list
> > And TYPE the programs in again
>
> I've found that it's oft
On Wed, Dec 26, 2018 at 6:15 PM Chuck Guzis via cctalk <
cctalk@classiccmp.org> wrote:
> On 12/26/18 3:17 PM, Al Kossow via cctalk wrote:
> >
> > And TYPE the programs in again
>
> I've found that it's often the best course of action and consumes the
> least time overall. You also have a better c
On 12/26/18 3:17 PM, Al Kossow via cctalk wrote:
>
>
> On 12/26/18 2:55 PM, Steve Malikoff via cctalk wrote:
>
>> Scan them all as-is, put them up and 'crowd source' this list
>
> And TYPE the programs in again
I've found that it's often the best course of action and consumes the
least time ov
On 12/26/18 2:55 PM, Steve Malikoff via cctalk wrote:
> Scan them all as-is, put them up and 'crowd source' this list
And TYPE the programs in again
Mattis said
> Finally I got hold of the sources for the PDP-11 SPACE WAR that was
> submitted to DECUS by Bill Seiler.
>
> The format is scans of the PAL-11S listing output. It is easy to crop the
> image to only contain actual source. Then running OCR on it. Tried a few
> online versions and tesse
On 2018-12-26 4:29 PM, Mattis Lind via cctalk wrote:
> Finally I got hold of the sources for the PDP-11 SPACE WAR that was
> submitted to DECUS by Bill Seiler.
>
> The format is scans of the PAL-11S listing output. It is easy to crop the
> image to only contain actual source. Then running OCR on i
> On December 26, 2018 at 4:29 PM Mattis Lind via cctech
> wrote:
>
>
> Finally I got hold of the sources for the PDP-11 SPACE WAR that was
> submitted to DECUS by Bill Seiler.
>
> The format is scans of the PAL-11S listing output. It is easy to crop the
> image to only contain actual source.
Finally I got hold of the sources for the PDP-11 SPACE WAR that was
submitted to DECUS by Bill Seiler.
The format is scans of the PAL-11S listing output. It is easy to crop the
image to only contain actual source. Then running OCR on it. Tried a few
online versions and tesseract.
The problem is t
23 matches
Mail list logo