Hi Alex, At 2023-03-25T20:38:39+0100, Alejandro Colomar wrote: > On 3/25/23 09:40, G. Branden Robinson wrote: [man pages using the `hw` *roff request to override hyphenation] > > This is true, and I do on rare occasions see man pages doing this. > > Maybe it would be a good thing for man.local? For manual pages, it > seems a bit too repetitive, IMO. Maybe for a manual page that uses > very rare keywords would be a good place to use those.
The man.local file is not an updatable thing by packages that install man pages. My first thought was that distributors could support a man.local.d directory where packages could install files, packages with man pages would provide short files within that directory containing `hw` requests, and distributors would change man.local to `so` or `soquiet` (groff 1.23) each of those files. We could call these "hyphenation override files". But there are still several problems. 1. `so` and `soquiet` don't perform glob expansion. So man.local would _still_ need to be edited to name every file required by a package providing man pages and requiring this feature. 2. man.local _could_ source a single file, updated by some trigger or post-installation script, that lists all the man page-containing package hyphenation override files. 3. These files could override each other. What if one package wants a word hyphenated one way and another package has a different preference? Worse, what if a man page expects groff's hyphenation patterns to apply to a word, but some unrelated package has gone and stomped all over it? Deciding who has precedence seems intractable. 4. Nothing prevents a package from populating the override files with things other than `hw` requests. Not only is this potentially nefarious, the fact that every override file would get read for every man page rendered means that someone else's botched hyphenation override file ruins every man page you try to read. My net takeaway from this is that it is indeed better to keep hyphenation overrides within individual man pages. But maybe the only way to know how tedious this really is to see how much a large, practical corpus of man pages, like the Linux man-pages, requires it. However, 5. Now that serial processing of man pages is practical (i.e., "groff -man page.1 page.2 page.3 anotherpage.1" and so on), item #3 above rears its head even without any shenanigans involving man.local. That file could be empty or nonexistent and this would still be an issue. The "good" news is that most people don't bother to serially render pages, and it will likely be a while, if ever, before man-db man(1) exercises this feature. Still, the threat exists. One of the themes of my suggested revisions to GNU troff has been to provide ways to unwind or reset things that historically haven't been available. One of those is environment removal (Savannah #60954). Another that has occurred to me is hyphenation override removal. Today, invoking the `hw` request without arguments does nothing. We could change it to clear any existing hyphenation overrides. Or, perhaps better, we could add an 'hwrm' or 'rhw' request; if given arguments, it reads each word (ignoring hyphens), matches it against the existing list of overrides, and removes the word if found. If given no arguments, it removes all overrides. Then, an.tmac (and doc.tmac) could call it when hitting `TH` (and `Dd`) macros, tidying up the state of the formatter for the next document. > > I'm ambivalent about the use of the `hw` request in man pages. > > > > 1. I like the clarity of the "never use *roff requests" rule. My > > internal bright-line rule enforcer is enamored of this > > principle. It keeps the fingers of the novice out of the meat > > grinder. [...] > > 5. We could hold to principle #1 by adding a man(7) macro, `HW`, > > which simply wraps `hw`. > > What's the gain of such a thing? Adherence to Puritan principle; I'd be able to keep pronouncing from on high in my ivory tower that one shouldn't invoke *roff requests in man page documents. > Translators will have the same problem translating .hw than > translating .HW. The only difference is that .HW would appear in the > man(7) spec, which would force them to recognize it, as opposed to > saying "we don't support plain roff". Right. That's worth something to me, though maybe not enough in this case to pay its freight. I have unrelated similar cases in mind.[1] > .MR will only improve the status quo, so if not many complained till > now (I did, but 1 is not too many), there will be even less need for > .hw soon. Yes. Cutting down the need for `hw` or `\%` is one of its advantages. > I'd say, let's defer this problem for long after .MR, and see if there > are any remaining issues. Same with \%, which is why I don't yet want > to introduce it in the Linux man-pages. Fair enough. The other side of the coin stamped "PORTABLE" is that nobody said you have to use all of the features that are. :) Regards, Branden [1] I find use of `br` also excusable in man pages, when no existing man(7) macro will serve. (You _could_ drop in an `RS`/`RE` pair with nothing between, or call `RE` or `EE` "unpaired", but those are pretty kludgy.) Yes, we could introduce a `BK` macro (`BR` is already taken), but it just doesn't seem worth the trouble to me. The only use I've found in groff's man pages for `br` is immediately preceding an `ne` request to manage widows and orphans. I am hoping those won't stay around forever, because (1) we implement `KS` and `KE` macros for managing keeps, as discussed earlier on this list, and/or (2) we format all paragraphs (and maybe (sub)section headings) in a diversion, and then only permit page breaks in reasonable locations.[2] This wouldn't be Knuth-Plass but it would be a big help, and I have a sketch in my mind of how to get it done with a diversion trap. [2] Here's the sketch. Gather (sub)section headings and paragraphs into diversions. Any paragraph of 3 output lines or fewer cannot have a page break within it. Collect the first two output lines of a paragraph into diversion (appending to the one used for the heading, if any). Then start a new diversion for further paragraph text. Once you've set the fourth output line of a paragraph, measure the space available to the bottom of the page (distance to the next page location trap \n[.t]). If this amount is more than the height of the (heading and the) first two lines of the paragraph, emit that diversion. Otherwise, break the page, emit both, and stop diverting. This could still leave an orphaned line if a paragraph were longer than a page (by one output line at the end), but that seems like a rare enough case that it doesn't need to be tackled at first. Maybe someone can see a flaw in this. Once worked out for man(7), it could be applied to all of our other macro packages.
signature.asc
Description: PGP signature