Update of bug #64202 (project groff): Summary: [man-pages]: groff_man(7) inconsistently (and redundantly) guards some .MR references with '\%' => [man pages] groff_man(7) inconsistently (and redundantly) guards some .MR references with '\%'
_______________________________________________________ Follow-up Comment #4: [comment #3 comment #3:] > [comment #1 comment #1:] > > Hi Keith, > > > > I'm aware of this. It's deliberate insofar as it's a consequence of other decisions. > > > > The main facts are these: > > > > 1. The new `MR` macro unconditionally prefixes its first argument with a `\%` escape sequence to suppress hyphenation. > > That's what I thought. Consequently, there is _absolutely_ no need for references, such as '.MR \%topic n', to _ever_ add that redundant '\%' prefix to the topic name. ...and there are no cases of it doing so in the groff tree, $ git grep 'MR.*\\%' || echo NONE NONE so your stridency here is a bit puzzling. > > 2. All of _groff_'s man pages (.[157]) files are produced in the build tree from from .man inputs. > > Again, I'm well aware of this, but the '*.man' sources _do not_ specify the redundant prefix, Agreed. > (other than incidentally, via a malformed transform for a '@g@' prefix). That's not incidental, it's deliberate. > And, therein lies the bug ... for it _is_ a bug. The intent of '@g@' is to add a program name prefix -- typically 'g' for GNU programs -- so that 'tbl' becomes 'gtbl', when appropriate; it has _absolutely no business_ to _ever_ include '\%' as part of that prefix. Why not? According to DWB, Heirloom Doctools, and GNU troffs, it's idempotent when repeated at the beginning of a word. $ cat EXPERIMENTS/hyphenation-point.roff .ll 3n foo \%foo \%\%foo \%\%\%foo A \%\%\%foo AB \%\%\%foo ABC \%\%\%foo ABCD \%\%\%foo .pl \n(plu $ nroff -Wbreak EXPERIMENTS/hyphenation-point.roff foo foo foo foo A foo AB foo ABC foo ABCD foo (I suppressed warnings because they're not relevant here; only spurious hyphens at the start of a word would be, and those would be visible in the output anyway. I also tried all three formatters with line lengths of 4n and 5n; this also failed to cause spurious hyphenation.) Formatters are prepared to handle inputs like this, and so too should macro packages be, if they want to claim general utility. > (FWIW, the seat of the bug is within the substitution for '@g@', as it is specified in the generated Makefile, at the point where 'topic.n' is generated from 'topic.n.man'). It's done for some other replacements as well. commit d84d9e1d85287b24d14001a6fdcbaa9cfc588d55 Author: G. Branden Robinson <g.branden.robin...@gmail.com> Date: Sun Feb 20 05:21:36 2022 +1100 Makefile.am: Use hyphenation control escapes more. * Makefile.am (.man): Prefix hyphenation control escape sequences to more configuration-time interpolations to prevent their hyphenation: @DEVICE@, @g@, @INDEX_SUFFIX@, @PAGE@, @TMAC_{AN,M,S}_PREFIX@, @TMAC_MDIR@. (That commit message is a little unfortunate. It should say "configuration-dependent", not "configuration-time".) > Understood. However, the intent of '@g@' should _not_ be subverted, for this unrelated purpose ... either specify '\%' _explicitly_, in any context where it is intended, or introduce a specific transform, other than '@g@' itself, which implies the effect of '\%@g@'. The purpose is not being subverted. You said yourself that the "seat" of this behavior is Makefile rules for generating .[157] from .man. It would be wrong to do so in "makevarescape.sed", for instance, because '@g@' and friends get expanded in contexts other than _roff_ sources. Moreover, valid _roff_ input is indeed being produced. > I think that this is an insidious bug, which should be fixed. I checked out your attached PDF and it looks quite nice to me. The problem with the hyperlinks is clear, and as you described; a stray percent sign is getting into some of the hyperlink targets you generate. This is not the fault of the formatter or the man page sources. If it were, then the hyperlinks that groff Git produces would have the same problem. $ ./build/test-groff -t -rU1 -man -Tutf8 -Z ./build/tmac/groff_man.7 | grep 'x X' | tail -n 20 x X devtag:.NH 1 x X devtag:.eo.h x X tty: link man:tbl(1) x X tty: link x X tty: link man:eqn(1) x X tty: link x X tty: link man:refer(1) x X tty: link x X tty: link man:man(1) x X tty: link x X tty: link man:groff_mdoc(7) x X tty: link x X tty: link man:groff_man_style(7) x X tty: link x X tty: link man:groff(7) x X tty: link x X tty: link man:groff_char(7) x X tty: link x X tty: link man:man(7) x X tty: link $ ./build/test-groff -t -man -Thtml -Z ./build/tmac/groff_man.7 | grep 'x X' | tail -n 25 x X devtag:.br x X html:<a href="man:tbl(1)"> x X html:</a> x X html:<a href="man:eqn(1)"> x X html:</a> x X html:<a href="man:refer(1)"> x X html:</a> x X devtag:.sp 1 x X devtag:.br x X html:<a href="man:man(1)"> x X html:</a> x X devtag:.sp 1 x X devtag:.br x X html:<a href="man:groff_mdoc(7)"> x X html:</a> x X devtag:.sp 1 x X devtag:.br x X html:<a href="man:groff_man_style(7)"> x X html:</a> x X html:<a href="man:groff(7)"> x X html:</a> x X html:<a href="man:groff_char(7)"> x X html:</a> x X html:<a href="man:man(7)"> x X html:</a> This is why I mentioned the following point in comment #1. > You do not _need_ to sanitize content destined for device control escape sequences (or the `device` request) of the `\%` escape sequence. The formatter will ignore this escape sequence in that context, skipping over it without diagnostic, and it will not appear in the "x X" commands that GNU troff produces. This is already the case in groff 1.22.4 and therefore I suspect it's been true for many years. Are you wrapping or replacing the `MR` macro and "sanitizing" its first argument for some other purpose? You said: > (which, in its present state of development, does not incur any address sanitizer overhead) ...which I didn't completely understand, as ASAN doesn't seem relevant to the present discussion of _roff_ macro processing. Leaving in "Need Info" status, as I'm stuck; I don't agree with your implication that repeated leading \% escape sequences in a word are invalid _roff_ input, and I don't have enough insight into the implementation you're working on to offer advice. Maybe you could share some of its code. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?64202> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/