Re: ripgrep author seems happy with groff_man_style(7)

Ingo Schwarze Tue, 21 Jan 2025 22:11:34 -0800

Hi Branden,

i'll try to focus on the most pertinent, most technical points,
and only respond selectively to political arguments, mostly where
i fear that misperceptions might arise.

G. Branden Robinson wrote on Tue, Jan 21, 2025 at 01:19:08PM -0600:
> At 2025-01-21T16:05:18+0100, onf wrote:
>> On Tue Jan 21, 2025 at 6:59 AM CET, G. Branden Robinson wrote:

[...]
> I want to implement a means to run the formatter in a mode that will
> dump all of the tags in a document--which need not require anything
> but changes to "an.tmac"--in a format useful for another tool to consume.
> That tool will probably be less(1).

For what it's worth, that is exactly what mandoc(1) has been doing since
July 2019, first released with mandoc-1.14.6 in September 2021.

Except that it does not require a different "mode" but that this always
happens if and only if a pager is used and that pager is more(1)
or less(1).  It does not require the user to provide any new
configuration or options.  It happens completely automatically
and transparently whenever the user types "man pagename" or
"apropos -a search_expression".

> I don't know what tag format it uses; even if it's ctags(1), that
> shouldn't be a problem, because at the time the document is formatted,
> it knows what the current output line number is.

Yes, the format less(1) uses and mandoc(1) produces is ctags(1),
which is a line-oriented text file using the line format
"tagname filename linenumber".  Even though the POSIX standard
for ctags(1) permits "search expressions" as an alternative to
line numbers, that's irrelevant for the present purpose precisely
for the reason you point out: we know the line numbers, so using
search expressions would be nothing but a waste of processor time.

[...]
> Hence the value of automatic tagging.  There is the problem
> that we don't want the human man page reader to have to key
> in a fully qualified tag name all the time.

Try the opposite prespective.
KISS the tags such that they are trivial to type in.

> All they need is an unambiguous match.

No you don't need that.  Do NOT even try to make them unique.
It won't work in practice because even within der same manual page,
the same identifier can legitimately be defined at two different
places, and at both in an authoritative manner.  A mathematician
would be horrified to have two definitions for the same term -
a practitioner, not so much:

  https://man.openbsd.org/ksh.1#echo
  https://man.openbsd.org/ksh.1#echo~2

The first defines the "echo" builtin command in POSIX mode,
the second in default mode.  For terminal output, you don't have
the HTML restriction that anchors must be unique, so you get:

   $ grep echo /tmp/man.oyHNMTMyV6
  echo /tmp/man.DNP7iW4MJE 1260
  echo /tmp/man.DNP7iW4MJE 1512

I'd say from the end user perspective, that is more usable than
with unique tags, and additionally does not suffer from the
problem that editing the manual page might change the meaning
of "echo~2".

You have another example in ksh(1) defining the tag "h" three
times, see below:  the \h escape in the $PS1 variable, the -h option
to the "set" builtin command, and the -h option to the "test"
builtin command.  In long manual pages, such cases aren't all
that uncommon.

> I don't know what less(1)'s facilities are for supporting this.

The less(1) program does not support partial tags or glob(7)ing
or regular expression search for tags, neither in :t nor in -t,
so unless you plan to become a less(1) hacker, you have no choice
anyway.

Apart from the fact that they would be useless with less(1),
not going for long, complicated, unique tags also avoids
the horrific overengineering that would likely ensue.

> I also don't happen to know how ctags(1) got extended to support
> C++ name spaces and other means of qualifying colliding identifier
> names.  But if ctags (perhaps Exuberant ctags, given the original
> ctags format's advanced age) got extended to cope with that problem,
> presumably less(1) learned how to interpret the extension.

Not that i'm aware of anything like that, no.  Also, given that ctags(1)
is mandated by POSIX and less(1) aims for portability (AFAIK),
relying on extensions might be a bad idea.

> Also, presumably mandoc(1) has solved the problem when it renders
> multiple pages and more than one supports, say, an `-h` option.

  $ man -ak Fl=h

and then typing ":th" and repeatedly pressing 't' gets me to:

     -h fmt  Audio file type.  The following file types are supported:
               # in aucat(1)

     -h      Treat symbolic links like other files: modify links instead of
             following them.  The -h and -R options are mutually exclusive.
               # in both chflags(1) and chmod(1)

     -h hashfile
             Place the checksum into hashfile instead of stdout.
               # in cksum(1)

     -h history
           Read log from history file history instead of the default
           /usr/ports/distfiles/history.  Turns on -nv, as this is a testing
           option.
               # in clean-old-distfiles(1)

     -h      Compress spaces into tabs.  This is the default behavior.
               # in col(1)

     -h      Print a short help message.
               # in compress(1)

     -h      Dump the CTF header.
               # in ctfdump(1)

     -h      "Human-readable" output.  Use unit suffixes: Byte, Kilobyte,
               # in both df(1) and du(1)

     -h      Display a brief summary of command line arguments and options.
               # in dig(1)

     -h hosts     File with hosts to use for building.  One host per line,
                  plus properties, such as:
               # in dpb(1)

     -h heads
             Number of floppy heads (1 or 2).
               # in fdformat(1)

     -h      Causes symlinks not to be followed.  This is the default.
               # in file(1)

     -h      An alias for the -L option.  This option exists for backwards
             compatibility.
               # in find(1)

     -h      If the -s option is also specified, the name of the remote host
             is displayed instead of the office location and office phone.
               # in finger(1)

     -h      Generates a help summary of flex's options to stdout and then
             exits.  -? and --help are synonyms for -h.
               # in flex(1)

     -h      Never print filename headers (i.e. filenames) with output lines.
               # in grep(1)

At this point, the status line at the bottom of the window says:

        /tmp/man.H8ZRqrZkmW (tag 21 of 103)

Here are a few more interesting cases that repeated 't' shows later,
but as opposed to the above, which is complete, what follows is
selective:

                \h            The hostname, minus domain name.
             -h | trackall    Create tracked aliases for all executed commands
             -h file            file is a symbolic link.
               # all three in ksh(1)

     h | H
           Help: display a summary of these commands.  If you forget all the
           other commands, remember this one.
               # in less(1)

     ? | h      Display help message with all available commands.  There is
                also (simple) context-sensitive help available at most
                prompts.
               # in disklabel(8)

           h    RTF_CACHED       Referenced by gateway route.
               # in route(8)

> Maybe Ingo (or a mandoc(1) power user) would like me to save me the
> trouble of researching these points?

Hope this helps.  :-)

>> I agree it would be nice if one could link to subsections and, more
>> importantly, terms within other manpages. As a matter of fact though,
>> man(7) can't even tag terms within the same page.

> It's the same problem with the same solution, as I conceive it.

Not unreasonable in so far as all these tasks ought to be considered
together, and the designed solution ought to be uniform and consistent.

However, the final design needs to contain syntactical and operational
details that may differ depending on whether we are linking into the
same or into another manual page.  Coming up with a good design is
certainly hardest for deep linking into other pages - but also much
less frequently needed in practice than some seem to think, which
is the reason why mandoc does not support it yet.  There simply isn't
enough demand to make it a priority, even though it may be desirable
in a few cases.

> A recurring theme in my contributions to groff's man(7)/mdoc(7)
> rendering has been to solve problems when rendering N pages at a time,
> where N can be 1 but might be greater.

Actually, the mandoc program demonstrates that from a pure user
perspective, a very straightforward solution is feasible that
will look totally trivial to the user.

Then again, programs like makewhatis(8) in the mandoc package that run
over lots of manual pages in sequence have been quite good at exposing
memory leaks and bugs caused by leaking context data from one manual
page to the next, and i rarely enjoyed the ensuing crashes and
debugging efforts.  Still, running commands like

   $ man -ak ~.

can be useful for finding bugs.  By the way:

   $ man -ak ~. | col -b | grep -c ^NAME
  8801
   $ time man -ak ~.
    0m17.16s real     0m15.07s user     0m01.18s system
   $ time doas makewhatis 
    0m10.58s real     0m08.96s user     0m01.35s system

That's less than two milliseconds for rendering one manual page
on average, and 1.2 milliseconds on average for the parsing
involved in makewhatis(8) - which includes building mandoc.db(5)
database files referencing the mdoc macro arguments in all the
manual pages on the system.

 $ ls -alh /usr/*/man/mandoc.db
-rw-r--r--  1 root  wheel   190K Jan 21 22:22 /usr/X11R6/man/mandoc.db
-rw-r--r--  1 root  wheel   540K Jan 21 22:22 /usr/local/man/mandoc.db
-rw-r--r--  1 root  wheel   2.3M Jan 21 22:22 /usr/share/man/mandoc.db

Those times were taken on my notebook that is almost 10 years old by
now and still uses a rotating harddisk, not an SSD.

Try that with groff at your own peril.

[...]
>> I'm not saying mdoc is perfect; it certainly doesn't afford me
>> the level of control I am used to from writing plain *roff, but it
>> pays off in the language's descriptiveness, relative ease of use,

> I think (1) its macro lexicon is too large;

I actually agree; and that's why i have deprecated several macros
over the years and only added exactly one, .Tg, and documented
that .Tg is intended to be used very sparingly and only useful
in unusual cases.

However, unlike with Texinfo and DocBook, the lexicon is not so large
that it causes a steep learning curve.

> (2) its DWIM-ish interpretation of isolated punctuation arguments,
> combined with its unique in-package macro interpreter, make it a poor
> base camp from which to make further forays into *roff document
> formatting;

Completely true.

> and (3) its community has too many Kool-Aid drinkers.  You're unusual
> in that you led a paragraph with an acknowledgement of its imperfection.

Has it?  And is that really unusual?

For example, habe i ever denied that these non-deprecated macros
are badly designed?  .Bk .Lb .Bf .Eo .St .At .Bx .Bsx .Nx .Fx .Ox .Dx
(Just as an example.)

Saying "FOO is the best tool i know of." is not the same as
saying "FOO is perfect."

[...]
> It's important to understand the main reason (I surmise) mdoc was born.
> It's because AT&T/USL was getting increasingly litigious and nasty about
> the Unix copyrights.  That included the "tmac.an" file.

I doubt that tman.an being non-free is the main reason, or even a
particularly relevant reason at all.  Between 4.3 and 4.4, the CSRG
essentially rewrote all parts of the kernel and userland and dodumentation
that hadn't been rewritten already.

Why would replacing the 267 line, 3941 byte file tmac.an have been
any kind of an obstacle while moving towards 4.4BSD?

> The Berkeley CSRG stopped contributing changes to the ms package,
> for instance, after 4.2BSD (1983), a full seven years before mdoc(7)
> appeared in 4.3BSD-Reno.

I never investigated ms(7) history, so i refrain from commenting
on that.

> mdoc(7) was born out of anger.  Not really at the difficulty of writing
> good man pages in man(7), but at AT&T for being dicks (which they were).
> The CSRG enlisted people, possibly on a paid basis (I'm not sure)
> to design and implement a replacement for man(7),

This seems inaccurate and misleading in multiple ways.

 * I am not aware of any signs of anger.  Which statements of which
   person back in those days gives you the impression of "anger"
   in relation to the AT&T tman.an file?
 * See the quotations below regarding the "difficulty of writing
   good man pages in man(7)" - it was a contributing factor,
   albeit not the main point.
 * I don't know about "people" in the plural.  For all i know,
   the only person in charge of designing and implementing the new
   language and rewriting the manuals was Cyntia Livingston -
   though others have probably contributed small bits here and
   there, as is common in free software development.
   But i seriously doubt anyone else was "enlisted".
 * It wasn't the CSRG that "enlisted" her, but Usenix.
 * The grant wasn't "to design and implement a replacement for man(7)",
   but to rewrite the bulk of non-free documentation text,
   and the new language was more like a by-product.
   When you rewrite everything from the ground up anyway, you can
   as well bring the tools up to modern standards while at it.

For the sake of combatting misinformation, let me quote from a
private message i received from Cynthia in 2014 without getting her
permission first.  I think there is nothing private in these
paragraphs and making it publicly known is in her best interest.
Besides, she sent this information to me in response to a mail
stating that i was working on a web site documenting roff and
mdoc history, so the information becoming public will certainly
not surprise her.

  What made mdoc possible for was a grant from Usenix to do the work.
  The work was done 1989-1991.  The task was to eliminate the copyrighted
  text of course but also to make the pages consistent.  There were 7
  different OS versions of man pages to go through and pull out all
  UCB copyright text.
  None of the tmac packages would help make the pages pretty and nice
  for both hardcopy and online formats and I realized that they were all
  owned by AT&T.  So mdoc was born.

So here we have it, straight from the horse's mouth:  The main problem
was the manual page *text* being non-free.  Second to that, a perceived
lack of output quality for the man(7) macro versions available back
then.  Copyright of the macro file itself was also a consideration,
but a minor one, more like an afterthought.

Cynthia continued:

  Flex was used to find key phrases, and convert those in the man pages,
  but had to use a highlighter on hard copy to really weed out the AT&T
  phrases.  I gave the hardcopy of that effort to Kirk cause he collects
  stuff and I didn't want to haul it around.  Any three words sequences
  that could be claimed copyright AT&T had to be modified.  After that
  the flex code grew into large macro converter tool.  Nothing I did
  ran fast.  Ask Kirk.

And a bit later:

  What made the macro package possible was groff.  I regret having had
  to make the work backward compatible with ditroff.  Not my decision.
  Would have loved to have rewritten the macros solely for groff.
  The package would have been smaller, simpler and efficient (faster,
  much faster.)

  At the time the project started Apple was promoting hyperlinks as the
  way of the future and that seemed like a cool idea.  Also an early ML
  out of Canada which I found to be a bit abhorrent - this is before
  mosaic and the www so it really was ugly.  The ML tried to document
  every nitty typesetting detail and required quite a learning curve.
  There had to be a simpler way, style sheets didn't exist yet.
  Programmers would never write a man page in that stuff!  I agonized
  over what would entice programmers to write nice man pages...
  This is where the semantic approach idea came from.
  It took me three revisions to the macros I was writing to get to the
  released version.

  One of my personal goals was to make it easy to convert the
  documentation to another format - figuring that hyperlink and MLs
  would be the future.  Regretfully - a convert to html tool and a C
  lib template to make incorporating documentation easier for
  programmers never happened. I did have a start on an html conversion
  but lost it somewhere after I started a new job.

So here we have it, mdoc(7) was designed with the same ideas in mind
that, shortly afterwards, led Tim Berners-Lee to publish the design
of HTML late in the year 1990.

> and, I suspect, to do so in such a way that the language could be
> reimplemented atop something that wasn't a *roff at all, because AT&T
> owned that too, hence the bespoke macro processing system.

That sounds like pure speculation to me, and also misleading.
Cynthia definitely had groff and liked using it, so AT&T owning
the original roff implementations cannot possibly have caused any
issues whatsoever.

> That did of course eventually happen.
> Much, much later I think than anyone at CSRG contemplated.

You can rightly credit Cynthia with a lot of foresight, but i have
never seen any indication that she, or anyone in the CSRG, foresaw
Kristaps Dzonsons' idea of implementing mdoc(7) without roff(7).
Besides, that happened 18 years later.  18 years is not a typical
delay for getting a project finished - not even in *BSD ;-).
So i think it's reasonable to assume that Kristaps' idea was
indeed a completely new idea and not reviving a stagnated project.

Look at the timeline:

 * April 12, 2006 Kristaps submitted his first patch that got
   committed to OpenBSD.  It was as two-line documentation improvement
   to the ktrace(2) manual page.
 * August 4, 2006 Kristaps submitted his second patch that got
   committed to OpenBSD, the last one before he became a developer.
   It was a one-word documentation improvement to the getfh(2) manual page.
 * Nov 22, 2008 Kristaps started mdocml development -
   on his own CVS server completely outside OpenBSD or any other BSD,
   more than two years after submitting two tiny patches to OpenBSD.
 * Feb 20, 2009 Kristaps started working on a terminal output mode,
   which wasn't included in his original plan, but a later idea of his
 * Mar 1, 2009 Joerg Sonnenberger created a port of mdocml in NetBSD
 * Mar 9, 2009 Ulrich Spoerlein created a port of mdocml in FreeBSD
 * Mar 15, 2009 Kristaps gave a talk at AsiaBSDCon
   entitled "deprecating groff for BSD manual display"
 * Mar 23, 2009 Kristaps started re-implementing the man(7) language in mandoc
 * Apr 6, 2009 Kristaps committed mandoc to OpenBSD base
   after being invited to become a developer
 * June 14, 2009 my first commit to mandoc in OpenBSD

So before Kristaps got his OpenBSD account, he was barely involved
in OpenBSD development at all.  I'm absolutely certain that
replacing groff was not part of his original idea because
i talked to him about that several times.  Initially, he only
wanted HTML output, which also resulted in the name "mdocml".
He only started terminal output three months after starting mdocml.

Consequently, insinuating that *BSD planned to supplant groff all
along during the more than 15 preceding years verges a conspiration
myth.

By the way, i recall talking to Theo de Raadt face to face during
in the evening of May 30, 2009, during the OpenBSD c2k9 hackathon
in Edmonton.  During that hackathon, i did not yet work on mandoc,
but worked on C library code including callrpc(3), clntraw_create(3),
getgrent(3), svc_run(3), yp_bind(3), yp_first(3) and other low-level
functions in the libc/gen, libc/rpc, and libc/yp directories.

When i asked Theo about what he thought about me joining kristaps@'
mandoc project, Theo's vibes were mostly "mandoc is a very cool idea,
I (Theo) also looked at the code and it's very nice and clean, I like it,
so go ahead if that's what you want to work on".  With that feedback,
i sent my first mail to kristaps@ the following day, May 31, 2009,
offering to help.

Theo also mentioned that less GPL code in the tree might eventually
become a potential side benefit, but that certainly wasn't central
to working on it, that working on mandoc sounded reasonable even
if we would later decide to keep groff in base for good.  Using a
creative idea and working on clean code definitely was the main point.

> The irony is that GNU, of which the BSD community was not yet a sworn
> enemy, was developing a wholesale Free Software replacement for troff at
> just about the same time.  (BSD people were not alone in recognizing
> AT&T's dickish nature.)

Again misleading.  "The BSD community" can't really have a "sworn
enemy"; it's much too diverse for that.

There is a difference between declaring people "enemies" and
voicing opinions about technical questions.

Yes, i do think that the BSD license is more free than the GPL
because it imposes fewer restrictions on the use and continued
development of the code (and so will probably some other BSD
developers, but probably not all) - but that does not make
people publishing code under (in my eyes) not completely free
licenses like GPL or CDDL or Apache 2 my enemies.  To the
contrary, GPL code is normally useful when no truely free code
is available, so i'm grateful to lots of GNU developers for the
work they are doing.  I believe *that* sentiment is widespread
in BSD communities - though likely not universal among all
individuals in those communities.

> So by the time BSD proudly introduced its all-singing, all-dancing
> second-generation man page formatting package, GNU troff had already
> come along and not only reimplemented man(7) under a Free Software
> license, but the troff formatter, the preprocessors that some man pages
> used (tbl(1) foremost), and output drivers for PostScript and terminals.
> (This was 1990.  PDF and HTML didn't exist yet.)

Again misleading.  Not only was Cynthia aware of groff -
she liked it and used it preferably.  There is no irony in that.

Besides, i just checked: the first BSD release to ship groff
was 4.3BSD-Net/2 (August 1991), the one between Reno and 4.4BSD.
It included groff-1.01, and already included the tmac.an implementation
by James Clark.  That groff release predates the groff git repository
though, which starts with groff-1.02 (June 1991).

So when Cynthia was nearing completion of her work in late 1991,
she did have the GNU man(7) implementation.  I'm not so sure
that already existed when she started her work about two years
earlier, in 1989.

Either way, even if she had, the decision to instead start mdoc(7)
clearly wasn't NIH syndrome but had very specific design goals
unrelated to licensing.

[...]
I never investigated the history of BSDI and doubt it's very
relevant at this point.  I'm also not familiar enough with SUN
history to comment on what you say about them, but given all
the above, i wouldn't be too surprised if there were a few
inaccuracies in those sentences, too.

> GNU programs had made inroads to BSD because they were (1) free of
> charge, (2) free to hack on, and (3) tended to be actively developed
> with concern for their quality,

Yes, i believe that is true, even if (2) comes with a few restrictions
that tend to sometimes cause a bit of friction - but not so much
that using GPL code is impractical.  For many purposes, even a BSD
project can use stand-alone GPL programs.  Unless you want to link
binaries to it - then GPL code becomes unusable IIRC.

> And that, as I understand it, is the origin of copyleft cooties.

Maybe, maybe not, who knows about cooties :).
My reason for disliking Copyleft is that it cannot be integrated with
truely free source code, and two pieces of code under different Copyleft
licenses usually cannot even be combined with each other.  On the other
hand, i think the problem Copyleft is supposed to solve never really
existed: if somebody copies your code and adds non-free parts to it,
anybody who wants to continue free development still has the original
version.  The problem that you can pay people to abandon voluntary work
and instead develop non-free code simply cannot be solved by
Copyright licenses.

While you may disagree with these (and similar) arguments, they
are definitely not religious.

> Having been adopted, groff was slated to be thrown out again.

Not really.  Where exactly do you see evidence that any BSD
community, in the mid-1990, already planned for replacing
groff, or GNU binutils, or GCC, or GDB, or whatever GNU
software you may have in mind?

At this point, i abandon answering your self-declared digression
in detail and return where you get back to groff and mandoc.

[...]
> It took about twenty years for that *roff-independent man page formatter
> to eventuate, and Ingo has had to make a lot of compromises--meaning,
> implement a lot more *roff features--than I think he or original author
> Kristaps Dzonsons ever imagined.  I've corresponded with both of them
> (the latter only about his "lowdown" project); they can correct me if
> I'm wrong.

No, that part is actually correct.  (Except, as explained elsewhere,
that the first time i see evidence of "replacing groff" as a goal is a
few months after Kristaps stated mdocml - and started it outside BSD.)

[...]
> How we got here, why mdoc looks the way it does in some respects--much
> of that is not a happy story.  It would have been better for everyone,
> in my opinion, for the BSDs to say, "hey, there's a basically free *roff
> implementation that we _already distribute_ that has a man(7)
> implementation.  AT&T/USL will not be able to take that away from us.
> Let's make man(7) better, [...]

Cynthia disagrees.  Already in 1989, she correctly recognized that
man(7) implemented an outdated paradigm that failed to separate
semantic function from presentation, making it hard to achieve
good quality output across multiple output formats at the same time.

Inside OpenBSD, i'm one of the strongest proponents of the idea
"i won't gratuitiously rewrite stuff from scratch, not even
when it is flawed, but i will instead strive to improve it in
small steps."  Several OpenBSD developers are much more aggressive
in that respect and throw away low-quality legacy code much more
easily to start over from scratch with a better ocerall design.

But in this case, even i have to say Cynthia was absolutely right.
When the fundamental paradigm of a language or program is totally
wrong or outdated, gradual improvements cannot fix it, and
starting with a completely new design is the only sensible choice.

[...]
> Now then--all of that said, I have no particular beef with the mdoc(7)
> language or the mandoc(1) formatter.  They are here, they exist, and,
> quite fortunately, Ingo is someone I can work easily with and who seems
> to reason about engineering decisions much in the same way I do.  So I
> aim to continue maintaining groff's mdoc(7) implementation as well as I
> am able--which, as with every other aspect of groff--may fall short of
> adequacy in the eyes of some.  There's a lot of work to do and limited
> time, notably once one subtracts that spent composing emails like this.

FWIW, i think you are doing a decent job maintaining the mdoc
implementation in groff.  In rare cases, i tend to think "now
that tweak verges overengineering" but i would classify those
as minor differences in design style.

Updating the groff port in OpenBSD causes a horrendous amount of
work - so much so that i still didn't manage to update to groff-1.23
in the OpenBSD ports tree, mostly due to large amounts of subtle
changes in behaviour between 1.22.4 and 1.23.  Before being able to
update, i have to disentangle and classify all those changes as follows:

 1. Desirable effects of bugfixes and of changes making groff
    behaviour saner or more consistent; some of these require
    adjustments in mandoc, too, and/or maybe in the test suite.
 2. Trivial changes that aren't necessarily intentional, but
    only affect unimportant edge cases such that they can stay.
    A few of these may requires adjusting mandoc, too, and
    several require asjusting the test suite.
 3. Build system hiccups.  These typically require setting
    configure or make(1) variables in the ports Makefile, but
    diagnosing what exactly is needed is typically hard.
    A few extreme cases might require patches.
    Due to extensive use of GNU autotools and gnulib, this
    class of problems is particularly large and annoying.
 4. Regressions.  These require pushing bugfixes to groff,
    plus patches in the ports tree until they get released.
 5. Intentional changes that we do not want to have.
    These require patches in the ports tree that may have
    to stay even in the longer term.

Each of these classes contains several items.
While i have already worked through quite a few issues, there are
still several issues that aren't even classified yet.

On OpenBSD, groff-1.23 and newer absolutely doesn't build out of
the box.  Quite to the contrary, getting it to build at all is a
serious challenge and getting harder and harder all the time, mostly
due to the fact that pervasavive use of gnulib totally cripples
portability.  If gnulib would be thrown out entirely and groff
would merely assume POSIX behaviour without prividing any
fallbacks, porting it would become massively easier, probably almost
trivial.

I have done the following upgrades of groff in the OpenBSD ports tree:

 * from groff-1.15.4 to groff-1.21 on March 19, 2011
 * to groff-1.22.2 on March 30, 2013
 * to groff-1.22.3 on November 6, 2014
 * to groff-1.22.4 on December 24, 2018
 * now trying to get from there to groff-1.23.0

Of these five updates, the last one to 1.23.0 feels like by far
the hardest, both in terms of behaviour changes and build system
breakage all over the place, sognificantly harder even than the
big leap from 1.15 directly to 1.21.

Don't take that as a complaint, though; i still appreciate your work.

If there are lots of changes, we can at least be sure that work
is being done!  :-D

[...]
>>>         Because mdoc(7) culture is rigidly prescriptive, its section
>>>         headings are tightly controlled, and I expect that this
>>>         problem only threatens when subsections are used (and
>>>         referenced).

Not entirely accurate:

   $ man -cT lint ifconfig
  [... no output whatsoever, not even a style warning ...]
   $ echo $?
  0

So even though the page contains lots of custom sections, there
isn't a single warning.  For details of how that page looks like, see:

  https://man.bsd.lv/ifconfig.8

>> Although mdoc(7) says something to the effect of:
>>   For a list of conventional manual sections, see MANUAL STRUCTURE.
>>   These sections should be used unless it's absolutely necessary
>>   that custom sections be used.

That wording was committed by kristaps@ on July 19, 2010.
Well, "absolutely" feels too strong.  Something like

  Usually, stick to the conventional sections and only resort to
  custom sections in unusual circumstances and when there are very
  good reasons, which happens most often in unusually long and
  complicated manual pages.

would better express what i recommend.  I should probably fix this
wording, thanks for pointing it out.

>> ...in reality it itself uses non-standard section headings:
>>   Name
>>   Description
>>   Manual Structure
>>   Macro Overview
>>   Macro Reference
>>   Macro Syntax
>>   Compatibility
>>   See Also
>>   History
>>   Authors

Yes, that's an example of an unusually long and complicated manual
page with unsusual needs.  Logically, Manual Structure ...
Macro Syntax are all subsections of the DESCRIPTION, but such a
gigantic DESCRIPTION wouldn't be ideal either.  Even with the
section split, the sections are still large, and making those
separate section, among other benefits, allows using .Ss below
MACRO OVERVIEW and MACRO SYNTAX, which would be impossible if those
sections were .Ss below the DESCRIPTION.

This is actually a good example of where custom sections are
justified.

> Yes, some--not all--of those are unconventional.  I wouldn't say "not
> standard" because we have no standard to which to point.  Just
> conventions, some of which have been codified in style guides.

>> I think the point is more about sticking to conventional section names
>> if possible than about forbidding non-standard ones.

> I think I have seen Ingo do the latter, but I could be mistaken.

That likely wasn't what i meant, i agree more with the former
than with the latter, and do not want to "forbid" custom sections.

Maybe what you have in mind is that i abhor a few specific sections
that are occasionally seen in the wild, most notably OPTIONS
and NOTES.  Those are indeed always terrible style and deserve
to be shot on sight, with no warning.  OPTIONS is usually the most
important part of the DESCRIPTION and splitting it out is at best
pointless, but usually causes disorganization.  NOTES is almost
always the hallmark of a totally disorganized page.  The authors
failed to make up their mind which material logically belongs
together, such that below NOTES, they randomly return to aspects
that have been discussed before, but not discussed properly.

[...]
> [2] I credit OpenBSD with being perhaps the only BSD faction that seems
>     to have articulated consistent principles about the copyright
>     licenses they'll tolerate, and hewed to them in practice.

You are mistaken even in that respect.  The official policy is:
Not fully free code can stay in the tree as long as it is free
to use and redistribute, even if there are restrictions regarding
changing it (like in the GPL).  But code newly imported must be
free.  Still, LLVM got imported, simply because there is no
alternatice to using GCC or LLVM.  A free compiler simply doesn't
exist, and OpenBSD doesn't have the resources to develop its own.

So even for OpenBSD, the practical approach is "be as free as
possible" knowing that absolute purity cannot be achieved.
So the difference to FreeBSD is one of degree, not one of
principle.  They tolerate more than we do (for example, ZFS
in the kernel), which we consider not free enough (and besides,
we consider the implementation of ZFS too sprawling to be viable
for us, so even if it were suddenly freed, we proabably would
not adopt it).

Finally, you are vastly exaggerating the "Copyleft Allergy".
Sure, when we get a chance, we will gladly replace GPL code
with free code of equal or better quality.  But it is hardly
a priority.  OpenBSD still has GNU texinfo and GNU cvs in our
tree even though it wouldn't be *that* difficult to get rid of -
it's just not very urgent.

The reasons for removing groff were, in descending order of importance:

 * By that point, mandoc(1) was a better replacement for the
   purposes of the OpenBSD base system (and those purposes
   do not require general purpose typesetting).
 * The version of groff we had back then was totally outdated
   and no one was interested in maintaining a copy in our tree.
 * The code quality of groff was blatantly inadequate for what
   OpenBSD is aiming for, and nobody was interested in doing code
   audits and cleanups on groff code - even if it had been under
   a free license, that wouldn't have changed.
 * Moving groff to the ports tree allowed to more easily
   provide a modern groff to users, so it actually helped users
   to do high-quality general-purpose typography.
 * Removing groff from base also allowed the removal of some legacy GPL
   code from base, but that was a by-product of minor importance and not
   among the main motivations.

Yours,
  Ingo

Re: ripgrep author seems happy with groff_man_style(7)

Reply via email to