Re: SY $3

G. Branden Robinson Thu, 09 Jan 2025 14:00:57 -0800

Hi Alex & Ingo,

At 2025-01-08T19:31:04+0100, Alejandro Colomar wrote:
> If you want an actual manual page where this would make sense, look at
> dl_iterate_phdr(3).


This is a better place to start than the specimen you offered first, in
my opinion.  As I've said elsewhere, I think the objective of man(7)
should be "to get a few nines of the job [of man page formatting] done".

mdoc(7) goes for 100%.  The impression I get from its advocates is that
anything that can't be reasonably achieved in its macro language can go
take a flying leap (into some other documentation format, presumably).

Hence its ambivalent relationship with tbl(1), for example.

https://cvsweb.bsd.lv/mandoc/tbl.c?rev=1.47&content-type=text/x-cvsweb-markup&sortby=date

(That said, I wholeheartedly endorse Ingo's view of docbook-to-man.)

>       int dl_iterate_phdr(
>               typeof(int (struct dl_phdr_info *info, size_t size, void *data))
>                   *callback,
>               void *data);

Here's how I'd lay that out using groff Git.

$ cat EXPERIMENTS/dl_iterate_phdr.3
.TH dl_iterate_phdr 3 2025-01-09 "groff test suite"
.SH Name
dl_iterate_phdr \- walk an ELF object yadda yadda yadda
.SH Synopsis
.B int
.SY dl_iterate_phdr (
.BI typeof(int\~struct\~dl_phdr_info\~* info ,
.BI size_t\~ size ,
.BI void\~* \~date ))
.BI * callback ,
.BI void\~* data );

Here's how that formats using the default line length.

$ nroff -ww -r CHECKSTYLE=4 -man EXPERIMENTS/dl_iterate_phdr.3
dl_iterate_phdr(3)          Library Functions Manual          dl_iterate_phdr(3)

Name
     dl_iterate_phdr - walk an ELF object yadda yadda yadda

Synopsis
     int dl_iterate_phdr(typeof(int struct dl_phdr_info *info, size_t size,
                         void * date)) *callback, void *data);

groff test suite                   2025‐01‐09                 dl_iterate_phdr(3)

And here it is using the traditional (and minimum practical) line length
of 65n.

$ nroff -ww -r CHECKSTYLE=4 -r LL=65n -man EXPERIMENTS/dl_iterate_phdr.3
dl_iterate_phdr(3)   Library Functions Manual  dl_iterate_phdr(3)

Name
     dl_iterate_phdr - walk an ELF object yadda yadda yadda

Synopsis
     int dl_iterate_phdr(typeof(int struct dl_phdr_info *info,
                         size_t size, void * date)) *callback,
                         void *data);

groff test suite            2025‐01‐09         dl_iterate_phdr(3)

I get no warnings, style or otherwise, and the formatting looks fine to
me.

But I think I see what you're talking about.

You want to impose an additional constraint on the formatting, such that
each formal argument to the function is typeset on one line.

I'm not sure that's a reasonable goal.  In 1979 when Doug McIlroy wrote
man(7), no one would have dreamed of using the lengthy identifiers we
have now.  Indeed, it apparently took the immediate and explosive
popularity of the curses library of 4BSD (1980), which merrily gobbled
up symbols from the global C symbol name space like "OK", "ERR", "TRUE",
"FALSE", "move", and "refresh" for people to realize that, hmm, the name
space is something that might be wanting curation, informally at least.

We know that no one dreamed of it because even a decade later, with ANSI
C, the wise men stroked their long beards when considering the problem
and said, "no, the supported length for symbols with external linkage is
still only 6, just like the IBM linkers of old; you can stick a prefix
on that, and fight it out among yourselves".[1]

Linker vendors must have been the crustiest, most hidebound people on
earth in those days.  I guess they had previously been compiler people
who switched focus after encountering Ada in the early 1980s.  ("Boo
hoo, the language wants us to do static analysis.  So unreasonable.")

> There are a few other ones too (some pthread_*() functions have such
> long function names that I need to wrap the first parameter).

I award the Prolixity Prize to Erlang, and memorialized it in a test
script.

  .TH CosNotifyChannelAdmin_StructuredProxyPushSupplier 3erl
  2021-05-31 "groff test suite" "Erlang Module Definition"
  .SH Name
  CosNotifyChannelAdmin_StructuredProxyPushSupplier \- OMFG

At 2025-01-08T20:57:04+0100, Ingo Schwarze wrote:
> New syntax ought to support semantic markup.

Broadly agree here.  Except for my planned "keep macros", `KS` and `KE`,
which mandoc(1) can harmlessly ignore forever if it wants.

> So you are talking about a combination of very long command names
> with very long arguments causing ugly formatting by overrunning the
> right margin (in some output modes).
> 
> None of that requires author intervention to solve because if desired,
> the macro set can automatically detect overruns and take appropriate
> action.

That, and the man page author can specify breakpoints with the `\:`
escape sequence, which is blessed as portable among man(7)
implementations that are actually maintained.

groff_man_style(7):
     \:        Insert a non‐printing break point.  A word can break at
               such a point, but a hyphen glyph is not written to the
               output if it does.  The remainder of the word is subject
               to hyphenation as normal.  You can use \: and \% in
               combination to control breaking of a file name or URI or
               to permit hyphenation only after certain explicit hyphens
               within a word.  See subsection “Hyperlink macros” above
               for an example.

               \: is a GNU extension also supported by Heirloom Doctools
               troff 050915 (September 2005), mandoc 1.13.1
               (2014‐08‐10), and neatroff (commit 399a4936, 2014‐02‐17),
               but not by Plan 9, Solaris, or Documenter’s Workbench
               troffs.

(I repeat my hedge from a recent thread as to whether/how much Plan 9
troff is maintained.  Solaris 10 and DWB troffs are absolutely not.
Illumos troff could be but isn't.)

> A particularly simple way to achieve that would be to build a maximum
> indentation into .SY and let man(7) wrap the line before the arguments
> if the length of the command name exceeds that maximum, similar to
> what the groff_man(7) manual page describes for .TP, except that a
> modern language should not allow the document author to manually
> specify the width like .TP does - at least not for a macro that is
> intended to be semantic, like .SY.

Agree.  But also if I'm understanding you correctly, that is already the
way the formatter works.

roff(7):
     Once an output line is full, the next word (or remainder of a
     hyphenated one) is placed on a different output line; this is
     called a break.  In this document and in roff discussions
     generally, a “break” if not further qualified always refers to the
     termination of an output line.  When the formatter is filling text,
     it introduces breaks automatically to keep output lines from
     exceeding the configured line length.  After an automatic break, a
     roff formatter adjusts the line if applicable (see below), and then
     resumes collecting and filling text on the next output line.

groff man(7)'s `SY` macro disables adjustment (because traditionally, no
one typesets synopses with adjustment), and therefore you won't suffer
any warning if the line can't be adjusted, a problem that threatens with
long unbreakable identifiers in other contexts.

> So, let's break the line before the first parameter if it would overrun
> the right margin (-rLL=NNn), and automagically calculate an appropriate
> indentation for the first parameter.
> 
> As for the right indentation, I'd make it the exact indentation that
> would make the first parameter touch the right margin, with a minimum
> indentation of 2n (being such a rare case, I'd hardcode this value; it
> shouldn't be hit under normal conditions).  Let's write some examples:
> 
>       foo baaar  |
>       foo baaaar |
>       foo baaaaar|
>       foo        |
>          baaaaaar|
>       foo        |
>         baaaaaaar|
>       foo        |
>         baaaaaaaar            << Overruns the right margin.
>       foo        |
>         baaaaaaaaar           << Overruns the right margin.

<blink>

I think that _at most_ I'd be willing to add another formatting-time
style register for this.  I don't want the man(7) language in which
documents are composed to carry this freight.  It's too fiddly and
subjective.

At 2025-01-09T20:07:34+0100, Ingo Schwarze wrote:
> Sounds better, but still not like a fully thought-through analysis of
> the problem.  For example, it's not necessarily the first argument
> that is long.  Consider this real-world example from an actual manual
> page:
> 
>   void
>   SSL_CTX_sess_set_remove_cb(SSL_CTX *ctx,
>       void (*remove_session_cb)(SSL_CTX *ctx, SSL_SESSION *));

Good example.

The already documented example of bsearch() should have primed the
ambitious synopsizing page author to consider that.

groff_man_style(7):
     We might synopsize the standard C library function bsearch(3) as
     follows.

            .P
            .B void *\c
            .SY bsearch (
            .BI const\~void\~* key ,
            .BI const\~void\~* base ,
            .BI size_t\~ nmemb ,
            .BI int\~(* compar )\c
            .B (const\~void\~*, const\~void\~*));
            .YS

     man produces the following result.

            void *bsearch(const void *key, const void *base,
                          size_t nmemb, int (*compar)(const void *,
                          const void *));

You can see right there that I don't have a problem with a formal
argument of pointer-to-function type breaking across a line.  In fact, I
regard it as likely (when it even comes up).

But then, I seem to remember Alex has said repeatedly that he hasn't
actually gotten around to _reading_ groff_man_style(7) yet...

> By the way, in mdoc(7), writing that is totally straightforward
> for documentation author:
> 
>   .Ft void
>   .Fo SSL_CTX_sess_set_remove_cb
>   .Fa "SSL_CTX *ctx"
>   .Fa "void (*remove_session_cb)(SSL_CTX *ctx, SSL_SESSION *)"
>   .Fc
> 
> The mdoc(7) language automatically breaks the line before the long
> argument, even though it's the second one, and proceed with an
> indentation of 4n.

$ cat EXPERIMENTS/SSL_CTX_sess_set_remove_cb.man
.TH SSL_CTX_sess_set_remove_cb 3 2025-01-09 "groff test suite"
.SH Name
SSL_CTX_sess_set_remove_cb \- use world's most brilliantly designed API
.SH Synopsis
.B int
.SY SSL_CTX_sess_set_remove_cb (
.BI SSL_CTX\~* ctx
.BI void\~(* remove_session_cb ")(SSL_CTX\~*ctx, SSL_SESSION\~*)"
.B )

And here's how I'd set it in man(7).

Results at 80n and 65n:

$ nroff -ww -rCHECKSTYLE=4 -man EXPERIMENTS/SSL_CTX_sess_set_remove_cb.man
SSL_CTX_se..._remove_cb(3)  Library Functions Manual  SSL_CTX_se..._remove_cb(3)

Name
     SSL_CTX_sess_set_remove_cb - use world’s most brilliantly designed API

Synopsis
     int SSL_CTX_sess_set_remove_cb(SSL_CTX *ctx
                                    void (*remove_session_cb)(SSL_CTX *ctx,
                                    SSL_SESSION *) )

groff test suite                   2025‐01‐09         SSL_CTX_se..._remove_cb(3)

$ nroff -ww -rCHECKSTYLE=4 -r LL=65n -man 
EXPERIMENTS/SSL_CTX_sess_set_remove_cb.man
troff:EXPERIMENTS/SSL_CTX_sess_set_remove_cb.man:8: warning [page 1, line 9]: 
cannot break line
SSL_CTX...move_cb(3) Library Functions ManualSSL_CTX...move_cb(3)

Name
     SSL_CTX_sess_set_remove_cb  -  use  world’s most brilliantly
     designed API

Synopsis
     int SSL_CTX_sess_set_remove_cb(SSL_CTX *ctx
                                    void (*remove_session_cb)(SSL_CTX *ctx,
                                    SSL_SESSION *) )

groff test suite            2025‐01‐09       SSL_CTX...move_cb(3)

Aha!  I finally had a problem!

So I make one change:

$ diff -u EXPERIMENTS/SSL_CTX_sess_set_remove_cb.man{,.new}
--- EXPERIMENTS/SSL_CTX_sess_set_remove_cb.man  2025-01-09 15:16:11.737805662 
-0600
+++ EXPERIMENTS/SSL_CTX_sess_set_remove_cb.man.new      2025-01-09 
15:19:00.653304820 -0600
@@ -5,5 +5,5 @@
 .B int
 .SY SSL_CTX_sess_set_remove_cb (
 .BI SSL_CTX\~* ctx
-.BI void\~(* remove_session_cb ")(SSL_CTX\~*ctx, SSL_SESSION\~*)"
+.BI void\~(* remove_session_cb ")\:(SSL_CTX\~*ctx, SSL_SESSION\~*)"
 .B )

$ nroff -ww -rCHECKSTYLE=4 -r LL=65n -man 
EXPERIMENTS/SSL_CTX_sess_set_remove_cb.man.new
SSL_CTX...move_cb(3) Library Functions ManualSSL_CTX...move_cb(3)

Name
     SSL_CTX_sess_set_remove_cb  -  use  world’s most brilliantly
     designed API

Synopsis
     int SSL_CTX_sess_set_remove_cb(SSL_CTX *ctx
                                    void (*remove_session_cb)
                                    (SSL_CTX *ctx, SSL_SESSION *)
                                    )

groff test suite            2025‐01‐09       SSL_CTX...move_cb(3)

And if the closing paren stranded on a line by itself is an annoyance--
though I'd consider the fact that is also clarifies that the previous
argument is of pointer-to-function type--I can prevent that break too.

$ diff -u EXPERIMENTS/SSL_CTX_sess_set_remove_cb.man.new{,2}
--- EXPERIMENTS/SSL_CTX_sess_set_remove_cb.man.new      2025-01-09 
15:19:00.653304820 -0600
+++ EXPERIMENTS/SSL_CTX_sess_set_remove_cb.man.new2     2025-01-09 
15:22:02.696758082 -0600
@@ -5,5 +5,4 @@
 .B int
 .SY SSL_CTX_sess_set_remove_cb (
 .BI SSL_CTX\~* ctx
-.BI void\~(* remove_session_cb ")\:(SSL_CTX\~*ctx, SSL_SESSION\~*)"
-.B )
+.BI void\~(* remove_session_cb ")\:(SSL_CTX\~*ctx, SSL_SESSION\~*))"

$ nroff -ww -rCHECKSTYLE=4 -r LL=65n -man 
EXPERIMENTS/SSL_CTX_sess_set_remove_cb.man.new2 
SSL_CTX...move_cb(3) Library Functions ManualSSL_CTX...move_cb(3)

Name
     SSL_CTX_sess_set_remove_cb  -  use  world’s most brilliantly
     designed API

Synopsis
     int SSL_CTX_sess_set_remove_cb(SSL_CTX *ctx
                                    void (*remove_session_cb)
                                    (SSL_CTX *ctx,
                                    SSL_SESSION *))

groff test suite            2025‐01‐09       SSL_CTX...move_cb(3)

My takeaway from this is a lesson that all typographers seem to acquire
and, as a detail-oriented person, that Alex should too:

At some point in typesetting we depart the realm of what is procedurally
correct in all circumstances and run into corner cases where
individualized judgments must be made, balancing semantic clarity with
typographic artistry.

I'd furthermore articulate the principle that if something is inelegant
but clear, even if it requires close reading to yield that clarity, keep
it as-is before contorting it in the pursuit of elegance.

> Probably, simply always using 4n would look better and more uniform.

It is not, however, traditional in man pages.  It think however that
would be easy to support--it sounds like the same feature as above ("I
think that _at most_ I'd be willing to add another formatting-time style
register for this.").

> With flush-right, you might get very large indentations that look
> weird.  Besides, KISS! - especially considering that name+argument
> overflow is an unusual edge case in the first place.

Good advice.

> Also, while indentation conventions vary among projects (for example,
> BSD uses 8n tabs for statements and 4n for continuations of the same
> statement on the next line, whereas groff source code tends to use 2n
> troughout IIUC),

For stuff Clark originally wrote, yes.  Some later contributors, even to
files Clark authored, didn't respect his indentation convention (Werner
Lemberg, I hasten to add, was _not_ one of these people).

Some code that originates elsewhere (like BSD) or is in the contrib
directory, doesn't follow Clark's conventions, understandably IMO.

I try to respect whatever the prevailing convention is.  When I have to
break a long line in Clark C/C++ and need to indent it _and_ am not
already in a parenthetical context, I use the previous line's indent+4n.

As a rule.

> If you really want to make the indentation variable in this special
> case of name+argument overrun (rather than just using 4n), then
> constraining it in the range from 2n to 8n inclusive would make
> sense to me because i would consider tab settings outside that
> range highly unusual in any source code formatting convention.

I don't see any reason that a C function's synopsis in a man page has to
exactly duplicate the appearance of its declaration in source code.

There are several problems with pursuing a false equivalence here.

1.  C/C++ prototypes/declarations generally start in column 1.  A man
    page synopsis will not.

2.  Some code wants the function/symbol name to start in column 1,
    pushing a function's return type to rest alone on the previous line.
    This is to make them easy to grep(1) (when ctags(1) or similar is
    unavailable, unused, or eschewed by the callow).  But you don't
    grep(1) man pages this way.

3.  Man page text needs to be adaptable to variable line lengths within
    a reasonable range (65-80n, I say).  A code project is either the
    Wild West or has a single mandatory line length that is enforced on
    all its developers using whips and thumbscrews.

4.  Some C/C++ language styles omit the names of formal parameters in
    declarations (cf. definitions), recording only their types.
    Stroustrup is famous for this, and Clark followed that convention in
    much of his code.[2]  I personally disagree with it, but more
    importantly, the practice seems to be much rarer in man pages.  I
    suspect this is because a function's man page often wants to
    _discuss_ its formal arguments in an unambiguous manner.

    Consider this synopsis:

      char *strstr(const char *, const char *);

    Doesn't illuminate much, does it?

Regards,
Branden

[1] 
https://stackoverflow.com/questions/38035628/c-why-did-ansi-only-specify-six-characters-for-the-minimum-number-of-significa

    "peterh" gives the reasonable-sounding, battle-hardened veteran's
    answer.  But we also see John Mashey show up to claim credit
    (blame?) for the first widely adopted extensions to Nils-Peter
    Nelson's string.h: strncat and strncpy.  I guess strncmp and strnlen
    came later, possibly from other hands.

    https://minnie.tuhs.org/cgi-bin/utree.pl?file=pdp11v/usr/include/string.h

[2] I suspect this is so that he could more easily tell by eyeballing a
    header file when he was attempting a function overload that would be
    invalid because it was duplicative.

    C++ was initially developed when compilers produced as few
    diagnostics as possible.  Further, the objective of a compiler was
    to do its damnedest to produce assembly output--ANY assembly
    output--and not to sweat trivialities like the correctness of the
    input program.  So Stroustrup maybe couldn't count on his compiler
    (or even his own Cfront) to warn him if he had colliding overloads.

    Regardless, I think it was a bad tradeoff.  One way experienced
    programmers learn an API is by reading the function declarations.
    The creator of an API can help people out a lot by picking
    meaningful names not just for symbols but for formal arguments.  And
    in a man page, a formal argument's name doesn't even need to be a
    valid C identifier; that's why I recommend hyphenated noun phrases
    for them.  That makes them convenient to discuss in the text of the
    man page.

    Of course, neither of these practices is the Rock Star Way.

signature.asc
Description: PGP signature

Re: SY $3

Reply via email to