Re: duplicate documentation, was: Release Candidate

G. Branden Robinson Sun, 15 Nov 2020 04:51:04 -0800

As so often happens, Ingo and I are at loggerheads.

At 2020-11-14T17:03:42+0100, Ingo Schwarze wrote:
> I would strongly oppose copying the same text to multiple
> documentation files.  Apart from correctness and completeness,
> conciseness is among the most important quality criteria for
> documentation.  So having the same text repeated in more than
> one place is among the worst suggestions you could possibly
> come up with.


I have to wonder how familiar you are with programming texts.  Why would
anyone read K&R when they can absorb the ISO C standard?

You've identified multiple virtues: correctness, completeness, and
concision.  One you've omitted is comprehensibility.

When people are learning (or refreshing themselves on) a technical
system, they attempt to find a document of high overall relevance to
their immediate goal.  Sometimes that goal is highly specific ("I need
to know what command-line option of foo(1) with frobnitz boojums.") and
sometimes more general ("I used AT&T troff 20 years ago and I remember
the broad principles but I want to see what groff is like.").

I posit that you cannot construct a corpus of documentation wherein
every true statement is asserted at most once, and reliably
cross-referenced from all other conceivable points of corollary
interest.

Documentation is an art, not a science, and even in Russian-style
mathematical literature (assumption, lemma, proof, repeat; no
discussion), which I have to presume is your model, people encounter
barriers to the Platonic ideal.

Moreover, there is a rule of pedagogy: repetition legitimizes.  To get
a concept across it often must be presented multiple times.

> In general, automatically generating documentation is a bad idea.

This claim is vacuous.  We do it all the time.  mandoc does it with man
pages.  You intend to say _something_ here, but I'm not sure what.

Moreover, your claim, as far as I can interpret it, implies the very
opposite of your earlier claims.  If complete, correct, concise
documentation were formally modelable, then it would indeed be amenable
to automatic generation.  You could perform a dependency analysis on it
and automatically generate a totally ordered document beginning with
axioms and proceeding with increasing ramifications until all claims
were exhausted.  (There may not be a _unique_ total ordering for reasons
which graph theorists and package management system writers are all too
familiar with, but you can find _some_ total ordering.)  Whether
intended or not, inclusion of tsort(1) in early Unix systems was a
genius move for stimulating generations of hackers to pick up some
discrete mathematics.  (I've used it for all sorts of things, including
planning an academic course of study.)

Off the top of my head, were we to adopt your approach to our man pages,
here are a few of the things we would lose.

1. Options sections.  Fair's fair, you've already stated objections to
   these existing at all (IIRC, you prefer options being raised inline
   as part of the Description section, and I changed our nroff(1) page
   at your prompting to do this.)  At the same time, if the Options
   section exists, you want it to precede a great deal of other
   material, even if that material would be far more usefully presented
   before.  See refer(1).  How comprehensible are most of those options
   before the refer command language has been described?

   On top of this, many options have common meaning across multiple
   commands (a practice you elevate to principle, if I recall
   correctly).  So, under your ideal, a command's man page should only
   document options that it _uniquely_ recognizes, and all others should
   be behind a cross-reference.

2. Files sections; few commands open a non-operand file uniquely.  In
   groff, many pages make reference to device, font, and macro files.
   Most of these would need to move to some other document to which
   others would refer.

3. Environment sections; similar.

4. Synopsis.  Yes.  The synopsis says nothing that cannot be inferred
   from .TH and wherever else the options and operands are documented.

5. See also.  There is surely no reason for cross-references at the end
   of a document when they should be present already at the point they
   are required in the discussion.

6. Bugs.  A URL to a bug tracker or a master bug list, since some bugs
   can impact multiple components of the system, would eliminate
   redundancy.

> It is intended to be read by human beings, who will spend their
> time on reading it, and that's a valuable resource being spent.

Yes.

> So, if authors can't even be bothered to properly *write* the text -
> knowing that the time for writing it will only be spent once,

This claim is startling.  How much formal writing do you do?  A
significant portion of the time spent in the crafting of any serious
document is in revision.  Even if you didn't think this was true in
general, it's plainly true of _me_ as any look at groff's commit history
will attest.

> whereas the time for reading it will be spent many times - how can
> anybody reasonably be expected to read it?

They will read it if it effectively communicates what they want to know.
One measure of effectiveness is how swiftly they can get in and get out
without the mental effort of maintaining a lot of state; that is, having
to chase a long chain of cross-references to get to the one unique place
where a fact is recorded in Bibliotecha Schwarze.

Another factor in readability is the hedonic benefit of experiencing
the prose.  This is a highly subjective factor, and a virtue too often
absent from technical literature, which is why it has a reputation as
dry and boring.  But it also explains why the most successful works in
this discipline endure--because the writer is a talented stylist,
has an agreeable tone and/or sense of humor.  Once I learned enough math
to comprehend portions of Knuth's _Art of Computer Programming_ I was
surprised--though I should not have been--at how lucid and fun he was to
read.  I suspect that were you to take your editorial approach to his
work, it would swiftly become unrecognizable.

> Autogenerating documentation is just disrespectful of users, and so is
> copying duplicate text into multiple places.

I encourage you to invest some time in seriously pursuing this claim
with Wittgensteinian rigor and see where you end up.

> Yes, the current disaster with groff_man(7) / groff_man_style(7)
> should be fixed at some point after release.  I think Branden
> probably didn't intend it to stay this way,

You guess wrong.  It's pretty close now to where I intended it, and I'm
far happier with it now than I was after groff 1.22.4.  Amusingly
enough, it was in part at _your_ insistence that I restructured it as it
is, with a parent document that uses m4 to generate (1) a reference
page, whose importance you emphasized, and (2) a pedagogical document
for man page writers who care nothing about typesetting or any feature
not directly relevant to their man(7) endeavors.

> but did it as an intermediate step in the complex task of
> disentangling a large and complicated page into two logically separate
> parts.

I think the document will continue to evolve, namely to incorporate an
introduction covering "filling" and "breaking" and other concepts that
are alien to plain-text-only users.  But this is exactly the sort of
material you don't want there, even though it is essential to
groff_man_style(7)'s mission of being "helpful and accessible to man
page writers who may never read any other groff documentation." (NEWS)

Eventually, I'd like to have a list of "do"s and "don't"s, some
applicable to all man pages and some specific to the groff man pages'
"house style".  I would extract many of these from my commit messages.
This has been my plan for a long time, which is why my documentation
commits look the way they do.  But I have no clear idea of a time frame
for that work.

Regards,
Branden

signature.asc
Description: PGP signature

Re: duplicate documentation, was: Release Candidate

Reply via email to