Re: Scanning question (Is destruction of old tech docs a moral crime?)

Guy Dunphy via cctalk Mon, 22 Jul 2019 07:55:41 -0700

At 10:41 AM 21/07/2019 -0600, you wrote:
On Sun, Jul 21, 2019, 4:16 AM Joseph S. Barrera III via cctalk 
<cctalk@classiccmp.org> wrote:
>I'd suggest that in 2019 when bits are cheap and high-quality scanners
>nearly as cheap, "crappy quality digital image" is a bit of a straw man.
>Yes, I've seen plenty of barely-readable or practically unreadable scans,
>but they were made years or decades ago.


There are still plenty of bad scans being done today, for various reasons.
The technology of producing a final digital copy continues to improve and has a 
way to go yet.
*This* is why I strongly oppose destroying rare docs to scan them, now. Better 
to wait
till non-destructive scanning methods become available.


>What dpi qualifies as not "crappy"? 300dpi? 400? 600?

Points:
1. Both the DPI and bits/pixel affect the visual result. Having shaded pixels 
on curved edges
  makes the eye see a smooth curve, where the same resolution in two-tone (B&W) 
would look jagged.
  Achieving an optimal balance of resolution and shading levels for various 
types of content and
  fineness of detail, vs file size, is a bit of an art. 
  But ultimately it's a simple test: look at the paper original, and your final 
result on screen
  (at 1:1 final scale.) Does the quality look the same?
  Is your copy how the original publisher would have wanted the doc to appear?
  People only auto-producing PDFs rarely catch on to this, because PDF ONLY 
encodes as one of:
  two-tone B&W (fax mode), or JPG (or JPEG2000 rarely) or the excreable JBIG2 
(Never use this!)
  Experiment with PNG encoding, via a tool like Irfanview, which allows 
flexibly setting PNG
  bits/pixel, raw, indexed color or gray scale. PNG is a lossless encoding, and 
so the only
  resolution loss is by your choice while rescaling in post-processing.

2. The resolution you scan at, and the final presentation resolution, won't be 
the same.
  Especially when the pages include elements like screened color or B&W images. 
To deal with
  these properly you MUST scan at a resolution several times higher than the 
screen dot pitch. 
  Otherwise there will be moire patterns (beats) between the scan sampling and 
the screening dots.
  Then you post-process to eliminate the screening, and end up with a truly 
tonal image at the
  resolution the eye would perceive when viewing the original screened image.
  This avoids any moire patterning, realizes the original publisher's visual 
intent, and enables
  minimizing the final file data size.
  B&W text should be encoded with at least 16 gray levels available to edge 
shading. ie 4 bits/pixel.
  B&W tonal images need at least 256 level gray scale, or the eye sees 
quantization of shades (aka
  posterization.)
  Colour images need either 24 bit/px, ie 8 bits each for RGB, or if there are 
a limited number
  of flat colours an indexed color scheme may work. 256 colors or less, ie an 8 
bit index per pixel.
  Typical utilities will generate the color table automatically (which can 
sometimes ba a pain.)
  PDF does not allow any of these kind of user choices.

3. The final page images, don't have a 'dots per inch' dimension. They have 
only total number of
  pixels in H & V. When doing final page image down-scaling and choice of 
encoding, you have to
  make an aesthetic decision on final pixel dimensions.
  If your original page was A4 (8.5" wide) and you scanned at 600 DPI, that's 
5100 pixels wide.
  But you'll likely find that the final copy can be scaled to around 1000 to 
1200 pixels wide,
  with 4 bits/px (if B&W text), for an on-screen page image indistinguishable 
from the original.

4. All post processing should be done in 24 bit RGB, at the full scan 
resolution. Keep staged backups.
   NEVER use any indexed color scheme when scaling, rotating, etc. The result 
is unavoidably bad.
   The final two steps should be: rescale to desided X-Y pixel size, THEN 
down-code to final
   color system and file encoding.  There's a discussion of this in 
http://everist.org/temp/On_scanning.htm

In general, 'acceptable' resolution VERY MUCH depends on the content. 

>I just scanned my Rainbow 100 User's Manual at 300, 600 and 1200dpi using the 
>scansnap default settings. You see a jump between 300 and 600, but little 
>difference going on up to 1200 for this material. I posted the 300dpi results 
>and even they are acceptable. Some of the diagrams look heavier than the 
>600dpi version and at high zoom you see pixelated letters, where the 600 
>doesn't. The 1200 is hard to see any big difference and takes 4x as long to 
>scan. I think I'll be scanning the remaining rainbow docs at 600dpi. The file 
>is 22MB vs 12MB, so that's worth it. The 1200dpi version was almost 70MB which 
>is starting to get a bit large for a 60 sheet document. The sweet spot seems 
>to be 600dpu, at least for this material.

Just wondering if you're aware of the freeware util Irfanview?  
https://www.irfanview.com/
It's very capable for batch processing large sets of images. Rescaling, 
changing coding, cropping, etc.

Guy

Re: Scanning question (Is destruction of old tech docs a moral crime?)

Reply via email to