On Fri 01 Oct 2021 at 20:53:26 (-0400), Greg Wooledge wrote: > On Fri, Oct 01, 2021 at 05:44:41PM -0700, Fred wrote: > > man command | col -b > command.txt > > Curious. > > unicorn:~$ man ls > ls1 > unicorn:~$ man ls | col -b > ls2 > unicorn:~$ ls -l ls1 ls2 > -rw-r--r-- 1 greg greg 8299 Oct 1 20:49 ls1 > -rw-r--r-- 1 greg greg 7745 Oct 1 20:49 ls2 > > Glancing at the diff -u between the two files, most of the changes > appear to be whitespace related. > > Opening them both in vim, the second one has a bunch of literal tab > characters, whereas the first one has no tabs at all -- only spaces. > > So I guess most (or all?) of the size reduction is groups of spaces > being replaced by tabs.
Yes, diff -ubw will confirm that whitespace is the only difference. I'm too lazy to check that the /apparently/ significant differences shown by diff and diff -u are merely caused by the left-margin offset, which makes TABs skip to different columns from those intended. Like others, I tried to guess what Gene really wanted (I can barely believe the answer) and to come up with a suitable method. I also tried out others' suggestions. We don't know what the mysterious man 9 actually is (unless there's a well-known command named "9"), so I used man man and man bash¹. $ man man > /tmp/man.txt $ man --ascii man > /tmp/man.txt If you use this, and then intend to print it, have a care what the width of your window is set to. Too narrow and you get hyphena‐ tion hell; too wide and you have to shrink it down to fit the paper. $ man -t man > /tmp/man.txt (wrong) $ man -t man > /tmp/man.ps Good for people who still have a PostScript workflow. Myself, I prefer: $ man -t man > /tmp/scratch.ps && ps2pdf /tmp/scratch.ps /tmp/man.pdf And if you have a big screen, you can postprocess it with, say, $ pdfjam --vanilla --nup '2x1' --noautoscale true --scale 1 --landscape --papersize '{11in,17in}' --outfile /tmp/man-2up.pdf /tmp/man.pdf and have double spreads. US paper sizes. ☹ File sizes: Not usually much of an issue nowadays. But let's see: 66546 /tmp/man.pdf 72147 /tmp/man.ps 35932 /tmp/man.txt and col -b can reduce the size of man.txt a little. But moving on to bash: 377528 /tmp/bash.pdf 616850 /tmp/bash.ps 345111 /tmp/bash.txt Not such a big difference now between the text and PDF. And I can easily close that gap by revealing what I haven't earlier: /tmp/bash.txt was generated on a 121-wide xterm. So, running COLUMNS=NNN man --ascii bash > /tmp/bashNNN.txt to simulate different widths, we get: 377528 /tmp/bash.pdf 381919 /tmp/bash072.txt 371213 /tmp/bash080.txt 355442 /tmp/bash100.txt 345111 /tmp/bash121.txt And I would finally add that reading through any of /tmp/bash*.txt is /very/ heavy going, and difficult to absorb, because of the frequency of special terms, which usually would be coloured or marked up. ¹ Decades ago, I used to print out big man pages, such as bash, fvwm and ffmpeg, so that I could read through them in an armchair (no laptop then). I still generate PDFs for those and others, as I find the result easiest on the eye. I also find them easiest to navigate (flicking back and forth) because the formatted pages have the most distinguishable individual appearance. Cheers, David.