Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP

2023-10-23 Thread Alejandro Colomar
Hi Branden,

On Mon, Oct 23, 2023 at 04:15:23AM -0500, G. Branden Robinson wrote:
> [dropped t...@mandoc.bsd.lv from Cc; I'm not subscribed, so it
> deep-sixes my mails]

[Added groff@, to have some mailing list]

> 
> At 2023-10-19T23:32:42+0200, Alejandro Colomar wrote:
> > > > $ cat nested_indent.man 
> > > > .TH nested_indent 7 2023-10-19 experiments
> > > > .SH Ingo said:
> > > > .TP
> > > > Todo
> > > > Currently, when formatting .TP or .IP with a non-empty head,
> > > > [yada yada]
> > > > .RS
> > > > .PP
> > > > When formatting .IP or .RS with an empty head, mandoc needs
> > > > [yada yada]
> > > > .RE
> > > > 
> > > > As you can see, here the indentation is controlled by a single
> > > > RS/RE pair, and everything within it uses PP as a normal paragraph
> > > > separator.
> > > 
> > > While that also generates correct terminal and typographical (PS,
> > > PDF) output in the same purely presentational sense as .TP .IP .TP,
> > > it does not help with respect to the semantic problem we are
> > > discussing here.
> 
> My approach when requiring indentation "under" a tagged paragraph has
> been to use `RS`/`RE` twice; once to align _normal_ paragraph
> inset with the tagged one's indentation, and once to inset relative to
> the tagged paragraph.  Understanding this matter was one of the factors
> that drew me into groff and man(7) development.[1]
> 
> > > You see that the first .TP, the .RS, and the second .TP are all
> > > child nodes of the top-level .SH.  The .RS is not a child of the .TP
> > > but a sibling.  The two .TP nodes still aren't siblings of each
> > > other.
> 
> This is a reasonable interpretation, and consistent with how we present
> RS/RE in groff_man(7)--alongside SH, SS, and EX/EE.  You can't nest any
> of those under a paragraph, either.
> 
> > > Now on first sight, you might blame me for that and call it a mandoc
> > > artifact, arguing that mandoc instead ought to treat the .RS as a
> > > child of the first .TP.  But no, that would be incorrect parsing
> > > for the following reason: the .TP inmplies an indentation, and
> > > the .RS also implies an indentation.  If the .RS were a child of
> > > the .TP, we would get double indentation.  You can make that
> > > argument even more convincing by adding a width argument to .RS
> > > and varying that argument.  That way, you see that the .RS is
> > > indented relative to the .SH, not relative to the .TP.
> 
> That's correct, and consistent with how I've documented this vexing
> issue in groff_man_style(7).
> 
> > > There are some cases where it is not completely clear whether one
> > > man(7) node following another man(7) node is a child or a sibling.
> > > mandoc(1) makes arbitrary choices in such ambiguous cases, usually
> > > opting for sibling relations where possible and avoiding unnecessary
> > > child relationships.  But this is not an ambiguous case.  Just like
> > > the .IP, the .RS is definitely a sibling and not a child of the .TP.
> > > As i said, no block can nest inside .TP.
> 
> I agree, but I may need to review and reconsider the second paragraph of
> this advice from groff_man_style(7), then.
> 
>• .RS doesn’t indent relative to my indented paragraph.
>   The .RS macro sets the left margin; that is, the position
>   at which an ordinary paragraph (.P and its synonyms) will
>   be set.  .IP, .TP, and the deprecated .HP use the same
>   default indentation.  If not given an argument, .RS moves
>   the left margin by this same amount.  To create an inset
>   relative to an indented paragraph, call .RS repeatedly
>   until an acceptable indentation is achieved, or give .RS
>   an indentation argument that is at least as much as the
>   paragraph’s indentation amount relative to an adjacent .P
>   paragraph.  See subsection “Horizontal and vertical
>   spacing” above for the values.
> 
>   Another approach you can use with tagged paragraphs is to
>   place an .RS call immediately after the paragraph tag;
>   this will also force a break regardless of the width of
>   the tag, which some authors prefer.  Follow‐up paragraphs
>   under the tag can then be set with .P instead of .IP.
>   Remember to use .RE to end the indented region before
>   starting the next tagged paragraph (at the appropriate
>   nesting level).
> 
> I'll see if I can undertake some experiments with mandoc(1) to see if it
> works consistently with groff man(7) (and every other man(7), as far as
> I know).  It seems like it might, because having an `RS` after the tag
> of a `TP` paragraph might close the "block" that mandoc(1) imputes to
> the `TP` macro when building the AST.
> 
> > > But very frequently, situations arise whe

Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP

2023-10-23 Thread G. Branden Robinson
Hi Alex,

At 2023-10-23T12:47:21+0200, Alejandro Colomar wrote:
> On Mon, Oct 23, 2023 at 04:15:23AM -0500, G. Branden Robinson wrote:
> [Added groff@, to have some mailing list]

Ah, okay, well, in so doing you have exposed some brain farts of mine to
the world.  But I have no doubt that Ingo would have caught them, and
possibly pounced, so I take this opportunity to correct myself.

> > My approach when requiring indentation "under" a tagged paragraph
> > has been to use `RS`/`RE` twice; once to align _normal_ paragraph
> > inset with the tagged one's indentation, and once to inset relative
> > to the tagged paragraph.  Understanding this matter was one of the
> > factors that drew me into groff and man(7) development.[1]

I didn't fulfill the promise of this footnote.  Here it is.

https://lists.gnu.org/archive/html/groff/2017-08/msg00028.html

> > This is a reasonable interpretation, and consistent with how we
> > present RS/RE in groff_man(7)--alongside SH, SS, and EX/EE.  You
> > can't nest any of those under a paragraph, either.

This is obviously wrong.  EX/EE "nest" within any kind of paragraph just
fine.  My brain seized upon a generalization and ran too far with it.

I withdraw the observation.  What I can say is that SH, SS, and RS/RE
all affect the amount by which the text is inset, and EX/EE do not.  Our
man pages already say that, though.

> > I'll see if I can undertake some experiments with mandoc(1) to see
> > if it works consistently with groff man(7) (and every other man(7),
> > as far as I know).  It seems like it might, because having an `RS`
> > after the tag of a `TP` paragraph might close the "block" that
> > mandoc(1) imputes to the `TP` macro when building the AST.

I felt a little guilty writing that and not going ahead and presenting
an exhibit, and publishing my laziness to a mailing list compounds my
embarrassment.

On the bright side, I took a quick stab at it, at (A) it worked as I
expected on the first try; and (B) groff and mandoc are in perfect
harmony here.

$ cat ATTIC/nests-inside-tagged-paragraphs.man
.TH foo 1 2023-10-23 "groff test suite"
.SH Name
foo \- frobnicate a bar
.SH Description
.TP
tag1
Prepare the turbo encabulator.
.RS
.RS
.EX
$ \c
.B "turbenc \-\-start"
.EE
.RE
It's up and running fine now.
.RE
.TP
tag2
.RS
Shut down the turbo encabulator.
.RS
.EX
$ \c
.B "turbenc \-\-halt"
.EE
.RE
It slowly grinds to a halt.
.RE
.P
All done now.
$ nroff -Tascii -man ATTIC/nests-inside-tagged-paragraphs.man
foo(1)  General Commands Manual foo(1)

Name
   foo - frobnicate a bar

Description
   tag1   Prepare the turbo encabulator.
 $ turbenc --start
  It's up and running fine now.

   tag2
  Shut down the turbo encabulator.
 $ turbenc --halt
  It slowly grinds to a halt.

   All done now.

groff test suite  2023-10-23foo(1)
$ mandoc -Tascii -man ATTIC/nests-inside-tagged-paragraphs.man | ul
foo(1)  General Commands Manual foo(1)

Name
   foo - frobnicate a bar

Description
   tag1   Prepare the turbo encabulator.
 $ turbenc --start
  It's up and running fine now.

   tag2
  Shut down the turbo encabulator.
 $ turbenc --halt
  It slowly grinds to a halt.

   All done now.

groff test suite  2023-10-23foo(1)

Et voilá.

> I'm not yet convinced by the superiority of mdoc(7), as you probably
> guessed.  I fear that I might face similar problems in other areas if
> I were to switch to it.

In a sense, it can't be flawed because it has no spec; but other edge of
that blade says it can't be flawless, because it has no spec.

Same's true of man(7), of course.

My mad crush on formal verification is not going anywhere soon.

> What I think I'm convinced is that there's no perfect way of dealing
> with indentation in man(7).  .TP/.RS/.RE might be the most consistent
> one.  I just wish it didn't introduce a break after the tag.

Can you name a man-pages document where this problem is itching you?
Probably many--show me just one at first.

I'd like to see how well the half-designed, existing-only-in-my-head
`LS`/`LE` extension macros for man(7) I've proposed might address it.

> The good thing about TP/RS/RE is that you could think of it like
> braces in C:  if (...) foo(); doesn't need them; if (...) {foo();
> bar();} does.  Similarly, mandoc could consider an RS directly
> following a TP as the wrapper for the tagged section.

That _seems_ to be what it's doing?  Don't get hung on up on the
indentation level; it appears to be bracketing the `TP` body to me.

If the input makes sense, and the output looks right, I wouldn't worry
too much about how mandoc(1) presents the tree.  I consider that a
debugging aid.

$ mandoc -Ttree -man ATTI

Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP

2023-10-23 Thread Alejandro Colomar
Hi Branden, Ingo,

On Mon, Oct 23, 2023 at 08:36:11AM -0500, G. Branden Robinson wrote:
[...]
> > What I think I'm convinced is that there's no perfect way of dealing
> > with indentation in man(7).  .TP/.RS/.RE might be the most consistent
> > one.  I just wish it didn't introduce a break after the tag.
> 
> Can you name a man-pages document where this problem is itching you?
> Probably many--show me just one at first.

There's no formatting bug.  The problem is lack of consistency in the
man(7) source, which confuses contributors.  I need to fix patches from
contributors that use PP when IP should go, or the other way around,
because they just don't understand why each of them is used in each
case.

The thing is, in my head 3 years ago, when I was first exposed to
man(7), I remember this inconsistency being one of the most complex
things to wrap in my head.

Wrapping everything in RS/RE so that one can use PP to continue within
a TP would also make it easy to move text from within a TP to a
non-indented section, since the indentation is only written once, in the
RS.  It's more DRY.

> 
> I'd like to see how well the half-designed, existing-only-in-my-head
> `LS`/`LE` extension macros for man(7) I've proposed might address it.

I think keeping it DRY would be the resume of what to aim for.

> 
> > The good thing about TP/RS/RE is that you could think of it like
> > braces in C:  if (...) foo(); doesn't need them; if (...) {foo();
> > bar();} does.  Similarly, mandoc could consider an RS directly
> > following a TP as the wrapper for the tagged section.
> 
> That _seems_ to be what it's doing?  Don't get hung on up on the
> indentation level; it appears to be bracketing the `TP` body to me.

Yeah... in terminals.  But in HTML, things go wrong.  See


> 
> If the input makes sense, and the output looks right, I wouldn't worry
> too much about how mandoc(1) presents the tree.  I consider that a
> debugging aid.

It's not only that, but also the HTML output.

> 
> $ mandoc -Ttree -man ATTIC/nests-inside-tagged-paragraphs.man \
>   | sed -n '/Description/,$p'
>   Description (text) 4:5
>   SH (body) 4:2
>   TP (block) *5:2
> TP (head) 5:2 ID=HREF
> tag1 (text) *6:1
> TP (body) 6:1
> Prepare the turbo encabulator. (text) *7:1.
>   RS (block) *8:2
> RS (head) 8:2
> RS (body) 8:2
> RS (block) *9:2
>   RS (head) 9:2
>   RS (body) 9:2
>   EX (elem) *10:2
>   $ \c (text) *11:1 NOFILL
>   B (elem) *12:2 NOFILL
>   turbenc \-\-start (text) 12:4 NOFILL
>   EE (elem) *13:2 NOFILL
> It's up and running fine now. (text) *15:1.
>   TP (block) *17:2
> TP (head) 17:2 ID=HREF
> tag2 (text) *18:1
> TP (body) 18:1
>   RS (block) *19:2
> RS (head) 19:2
> RS (body) 19:2
> Shut down the turbo encabulator. (text) *20:1.
> RS (block) *21:2
>   RS (head) 21:2
>   RS (body) 21:2
>   EX (elem) *22:2
>   $ \c (text) *23:1 NOFILL
>   B (elem) *24:2 NOFILL
>   turbenc \-\-halt (text) 24:4 NOFILL
>   EE (elem) *25:2 NOFILL
> It slowly grinds to a halt. (text) *27:1.
>   PP (block) *29:2
> PP (head) 29:2
> PP (body) 29:2
> All done now. (text) *30:1.

This got me curious about TQ, since mandoc(1) considers it "very rarely
used, even in GNU pages".

Ingo, you may want to reword that, since TQ was being used in the Linux
man-pages project, and yesterday I wrote a patch to use it even more:


Currently, it's being used in 684 pages, and 3108 cases.

$ grep -rH '^\.TQ' | uniq | wc -l
684
$ grep -rH '^\.TQ' | wc -l
3108

Well, here goes my experiment, based on Branden's one.

$ cat nest-of-rats.man 
.TH foo 1 2023-10-23 "groff test suite"
.SH Name
foo \- frobnicate a bar
.SH Description
.TP
tag1
.TQ
another1
Prepare the turbo encabulator.
.RS
.RS
.EX
$ \c
.B "turbenc \-\-start"
.EE
.RE
It's up and running fine now.
.RE
.TP
tag2
.RS
Shut down the turbo encabulator.
.RS
.EX
$ \c
.B "turbenc \-\-halt"
.EE
.RE
It slowly grinds to a halt.
.RE
.P
All done now.

I just added the TQ another1.

$ mandoc -Ttree ./nest-of-rats.man
title = "foo"
sec   = "1"
vol   = "General Commands Manual"

Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP

2023-10-23 Thread Ingo Schwarze
Hi Alejandro,

Alejandro Colomar wrote on Mon, Oct 23, 2023 at 04:30:58PM +0200:

> This got me curious about TQ, since mandoc(1) considers it "very rarely
> used, even in GNU pages".
> 
> Ingo, you may want to reword that, since TQ was being used in the Linux
> man-pages project,

Done, thanks for the heads up.
I append the resulting commit below.

> and yesterday I wrote a patch to use it even more:
> 

Strange, i pulled from
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages
and don't see such changes there, so i'm just judging from
code inspection right now, without looking at formatted versions.

I think that is a really bad patch.

 1. It gratuitiously makes the description of almost every option
longer by a whole line, which is a significant waste of screen
real estate.  It's further aggravating that due to long options,
most Linux manual pages already have an extra line for the
options at the start of each paragraph.  Now you are
doubling that from one to two wasted lines per option
compared to the same functionality on BSD.  The situation
becomes even more dire because Linux already tends to have
many low-utility options compared to BSDs, so you keep driving
density of useful information down and verbosity and fluff up.

 2. Your argument that this helps searching is a red herring.
The weakness of man(7) in searching has nothing to do with
the *formatting* of list item heads, it is caused by the
lack of *semantic markup*.  There would be no problem creating
search anchors for multiple .Fl topics on the same output line
in mdoc(7).  The only reason i did not do it is because it
is irrelevant for us since we barely have any of those
POSIX-violating long options.

 3. Seperating two synonymous .Fl entries onto different .TP/.TQ
lines weakens semantic expressiveness further.  Even though
mandoc already contains special guessing logic in the HTML
formatter to treat .TQ and following .TP as part of the same
list, other formatters and other output modes may be less smart.
I mean, those are not even the same macros, and yet you hope
for them to be rcognized as entries in the same list?

 4. Even the best HTML markup that is so far feasible results
in *two* list entries, one with an empty body and and only the
second one with the corresponding content.  While that may or may
not look superficially right (depending on the CSS), it certainly
isn't semantically correct and is likely to cause accessibility
grief, for example for blind people.
So, you want HTML formatters (and formatters to other output
languages that support semantic markup) to combine .TQ and
subsequent .TP not only into the same list, but also into the same
list element - but only if the body of the .TQ is empty, i guess?
So you want different macros to behave identically in some
ways (.TQ and .TP part of the same list) but the same macro
fundamentally different depending on its content.  The same
macro sometimes gets its own element and sometimes needs to
fuse into a different macro.
So much fun for implementers of formatting modules!

 5. On top of all that, i have a hard time to think of any macro
that has a more wicked failure mode than .TQ in case the
formatter does not support it.  The output visually looks
perfectly fine, and the reader gets no hint that the *most
important* information is missing.

Actually, part of the reason why i initially added that
additional warning about .TQ was that i felt uneasy about it:
less portable than for example .EX, less important than for
example .UR, no semantic benefit, purely presentational intent,
ad hoc house of cards on the foundation of .TP, which is
already shaky enough in its own right, and a terrible failure
mode.  So at that time in the past, i was quite happy to get
the impression that it was rarely used and stressed that in the
documentation, hoping people would behave reasonably and only
use it in exceptionally dire cases where really nothing else
could possibly help.

Either way, that sentence in our manual page clearly is no longer
true even without that particular patch, so it's gone now.

> TQ seems to be a sibling of TP.  Not sure if this will affect this
> -Thtml bug in some way; my experiments seem good, but they weren't
> exhaustive.

Yes, i believe mandoc -T html is already able to more or less
cope with typical use cases of .TQ.  It seems likely though that
you can construct less typical use cases that render badly to HTML
(or any other semantical markup language for that matter).  You can
be virtually certain that HTML output from any non-mandoc formatter
will be atrocious right now, and any future output to any semantical
markup language will almost certainly be bad initially unless w