Hi Alex, I endorse Dave's critique of your measurement. But it was interesting to me to observe the performance of GNU troff and grotty with literally billions of lines of input.
At 2023-05-09T15:37:16-0500, Dave Kemper wrote: > (I can already hear the questions. "Why is the terminal basic unit > not a line? What IS it then?" The answers are: > > 1. I have no idea. [...] > 1. Well, I still have no idea, so there's no more I can say about > that. [...] > $ echo '.tm \\n[.V] is \n[.V]' | groff > \n[.V] is 1 > $ echo '.tm \\n[.V] is \n[.V]' | nroff > \n[.V] is 40 > > So an INT_MAX-length terminal page would be INT_MAX / \n[.V] lines > long. Though probably not even that, because a defensively coded > "infinite" page length would be something like \n[INT_MAX]-2v.) Yes, to allow room for one blank line and the page footer. Regarding the motion quanta of nroff devices, as far as I know the values of 24 and 40 for .H and .V are historical and go way back to Ossanna troff, which generalized nroff's design to apply to another output device, the Graphic Systems C/A/T phototypesetter. As for why these values are what they are, I have only a semi-educated guess. It might be worth keeping in mind that nroff devices weren't originally character-cell video terminals. They were Teletype machines and line printers (and, later, daisy-wheel printers--this is apparently the "Diablo" technology mentioned so prominently in Unix manuals circa 1980). Teletypes were capable of half-line motions and other printers were also capable of motions smaller than a character cell, if not as fine-grained as those of a plotter or typesetter. So _some_ subdivision of the "character cell" was necessary to retain compatibility with existing nroff targets, and for others that could be foreseen in 1973. A. Apparently the narrowest practical space in lead typography is the "hair space", which is 1/12th of an em. AT&T troff and its descendants support this with the \^ escape sequence. Further inter-word and inter-sentence space is quantified in twelfths of an em as well (the `ss` request). Doubling the division of a character cell on the horizontal axis, yielding 24, might have been thought to head off rounding/quantization problems. Possibly one could mutter something about the Nyquist-Shannon sampling theorem here. B. The choice of 40 as the vertical dimension seems to me a little more mysterious. Notoriously, and unlike CJK ideograms, Western alphabetic letter forms are generally not square, and they require rectangular bounding boxes that are taller than they are wide. This gets us to a number that is bigger than 24...but why 40? An aspect ratio of 5:3 seems reasonable but I don't know that it's the only reasonable choice. For setting mathematics you want fairly fine control over vertical motions. At a default type size of 10 points, it's easy to imagine wanting to be able to support vertical motions as small as 1 point. There are 72 points to the inch: 6 lines of 10-point type on 12-point vertical spacing. That gives us 72/6=12. But a number bigger than 24 was desired. My guesswork runs out here. Why 40 and not 36 or 48? I suppose 40 was the best aspect ratio of the three. Having a large lowest common multiple for the horizontal and vertical resolutions would also seem to have some nice properties, though off the top of my head I can't offer any reasons I feel are strong. We may need a Bell Labs CSRC veteran to speak to this issue. I hope the information didn't pass away with Joe Ossanna. Oh, and incidentally, I did a Fermi estimate of how many Encyclopedia Britannicas would be the equivalent of a 2 billion-line man page, and the answer is: about ten. (An EB has about 40 million words.) I'm not worried about a man page approaching that limit. Regards, Branden
signature.asc
Description: PGP signature