On Fri, Aug 13, 2021, 6:15 PM J. David Bryan via cctech < cct...@classiccmp.org> wrote:
> On Friday, August 13, 2021 at 17:23, Alexandre Souza wrote: > > > Is any kind of standard, recomendation, group, mail list, to discuss > > the subject? > > I am not aware of any. I started with Al Kossow's basic recommendations, > modified slightly: > > - scan at 600 dpi > - use TIFF G4 where feasible > - use tumble to convert to PDF > > I then wrote and use a couple of simple image-processing utilities based > on > the Leptonica image library: > > http://www.leptonica.org/ > > ...to clean up the scans (the library makes the programs pretty trivial). > They start with the raw scans and: > > - mask the edges to remove hole punches, etc. > - size to exactly 8.5" x 11" (or larger, for fold-out pages) > - remove random noise dots (despeckle) > - rotate to straighten (deskew) > - descreen photos on pages into continuous-tone images > - quantize and solidify screened color areas into solid areas > - assign page numbers and bookmarks in the PDF > > A good example PDF produced by these programs is: > > http://www.bitsavers.org/pdf/hp/64000/software/64500-90912_Mar-1986.pdf > > The cover is a "solidified" black/gray/white image, manual pages 1-2 and > 1-4 are continuous-tone JPEG images overlaying bilevel text images, and > the > rest of the pages are masked, deskewed, bilevel text images. The PDF > bookmarks and logical page numbers are auto-generated from the original > scan filenames. > > The final step is linearizing the PDFs, but I'm wondering whether this is > still useful. > > -- Dave It is of negative value. Any single container for a document makes it easier to handle than a bunch of pages discrete files that must be managed as a unit. Bandwidth is cheaper than human labor. Don't optimize the wrong thing. > > > >