Re: duplicate documentation, was: Release Candidate

Ingo Schwarze Sun, 15 Nov 2020 07:25:10 -0800

Hi Branden,

G. Branden Robinson wrote on Sun, Nov 15, 2020 at 11:49:48PM +1100:
> At 2020-11-14T17:03:42+0100, Ingo Schwarze wrote:


>> I would strongly oppose copying the same text to multiple
>> documentation files.  Apart from correctness and completeness,
>> conciseness is among the most important quality criteria for
>> documentation.  So having the same text repeated in more than
>> one place is among the worst suggestions you could possibly
>> come up with.

> I have to wonder how familiar you are with programming texts.

I love reading novels, even long ones, and often do so.  But i usually
refrain from reading books about programming because they tend to be
too long for my taste.  For example, i never finished Stroustrup because
i got too bored with all the redundancy.  Standards and references
serve better in the field of programming.  Why would i want a novel
mixed into a technical text?

> Why would anyone read K&R when they can absorb the ISO C standard?
> 
> You've identified multiple virtues: correctness, completeness, and
> concision.  One you've omitted is comprehensibility.

That's not a separate goal.  It's a consequence of the other goals.
Note that we are not talking about general pedagogy here like you
would use it in elementary school, but about teaching programmers.

> When people are learning (or refreshing themselves on) a technical
> system, they attempt to find a document of high overall relevance to
> their immediate goal.  Sometimes that goal is highly specific ("I need
> to know what command-line option of foo(1) with frobnitz boojums.") and
> sometimes more general ("I used AT&T troff 20 years ago and I remember
> the broad principles but I want to see what groff is like.").
> 
> I posit that you cannot construct a corpus of documentation wherein
> every true statement is asserted at most once, and reliably
> cross-referenced from all other conceivable points of corollary
> interest.
> 
> Documentation is an art, not a science,

That is of course true.
Including that fact that some repetion is unavoidable.

> and even in Russian-style
> mathematical literature (assumption, lemma, proof, repeat; no
> discussion), which I have to presume is your model,

Not quite; i do want concise sentences inserted explaining the
practical purpose of what is being described.

> people encounter barriers to the Platonic ideal.

True, too; and such barriers are invariable hit earlier than the
ultimate barrier that, informally speaking, complete and consistent
formal systems cannot be constructed.

> Moreover, there is a rule of pedagogy: repetition legitimizes.
> To get a concept across it often must be presented multiple times.

That's why, if you give an oral presentation, you don't just
read out the manual start to finish.  That's why, teaching
yourself, you don't just read the manual page once, start to
finish.  On first reading, you skip parts not jet relevant.
Later on, you skip parts you are already familar with.

But frankly, i hit the opposite problem far, far more often.  In
programming practice, it barely happens at all that you cannot
understand a piece of correct, complete, concise documentation
because insufficient pedagogical skill was used in explaining it.
Programming is simple in principle.  There is nothing really
complicated like you find it in quantum field theory or in other
mathical theories of similar complexity.

But i hit the opposite problem all the time: that i waste lots of
time figuring out whether i have a complete picture of all features
related to my question in the language or system at hand because the
documentation i too long, being mixed with irrelevant basics and
organizing the material according to some pedagogical idea rather
than systematically.

>> In general, automatically generating documentation is a bad idea.

> This claim is vacuous.  We do it all the time.  mandoc does it with man
> pages.  You intend to say _something_ here, but I'm not sure what.

I'm talking about the text, not about the markup.  Groff, mandoc,
pod2man and the like merely translate (human-writeable) semantic
annotations to (machine-readable) formatting instructions.
They do not auto-generate any text, like you are doing it with
groff_man / groff_man_style.

> Moreover, your claim, as far as I can interpret it, implies the very
> opposite of your earlier claims.  If complete, correct, concise
> documentation were formally modelable,

You understood my point correctly that not being formally modelable
is among the reasons why it needs to be written by hand.  But even
though not formally modelable, it still needs to strive for correctness,
completeness, and conciseness as far as possible.

I'm not responding to your of attempt at reductio ad absurdum line by
line.  Instead, let me just say that many of the goals of documentation
conflict which each other, and then human judgement is needed to reach
a balance - since we are optimizing for multiple goals at the same
time, it cannot be an optimization in the mathematical sense, and
trying to formalize this process of human judgement often proves
counterproductive in practice.

Let me provide some canonical examples.

 1. The existence of the SYNOPSIS sections is an example of a
    compromise resolving a conflict of the goal of conciseness
    with itself.  Yes, you are right manual pages would be shorter
    without SYNOPSIS sections and still be complete and correct.
    But the gain in conciseness by deleting the SYNOPSIS - not
    talking about excessive, multi-line SYNOPSIS sections here -
    would be very minor because the SYNOPSIS is usually so short
    compared to the rest of the page.
    On the other hand, the SYNOPSIS provides a huge gain in
    conciseness because ever so often, i only need to look at
    the SYNOPSIS to have my question answered.
    Some people argue for a third level of conciseness vs.
    completeness, e.g. the --help option.  I consider that detrimental
    because it adds a larger amount of text than the SYNOPSIS for
    a lesser gain (because you already have both very concise and
    very complete docs even without --help.)  So personally, i think
    documentation is better without --help.  But i recognize that
    is a matter of opinion and some may make a different judgement
    call and prefer having this third level of conciseness, too.

 2. Comprehensibility on first, serial reading almost always
    conflicts with comprehensibility during repeated study in
    detail, and again, striking a balance is needed.
    For example, before diving into the options list, there should
    be a short paragraph stating what the general purpose and the
    default behaviour of the program are.  That is a huge gain for
    first serial reading, also handy for later revisiting as a
    concise reminder, and there is no major downside to a short
    paragraph.
    However, there must not be half a tutorial before getting to
    the meat of the matter.  As another example, in a section 1
    manual, section 5 material (for example the documentation of a
    domain specific language) must not be mixed in before documenting
    the program itself and its options.

 3. The goal of completeness, as i understand it, implies that
    redirecting to different pages should be avoided when possible.
    Consequently, the goals of completeness and conciseness are
    usually conflicting with each other, and again, judgement
    is needed to strike a balance.  For example, repeating
    short, formulaic sentences where appropriate is usually OK
    because reading them is much quicker and simpler than
    following a redirection; consider typical EXIT STATUS sections.
    But copying multiple long paragraphs of text into two distinct
    documents is an obvious (and an extreme) instance of totally
    missing this balance.

 4. SEE ALSO is a typical example where subtleties are often
    misunderstood.  That section shoud *not* have the same references
    as the main text.  The purpose of that section is to point to
    other pages that, by their general topic, are likely of interest
    to readers of the present page.  It should indeed *not* repeat
    each and every reference that was made in view of some specific
    detail in the main text.  Conversely, many of the entries in
    SEE ALSO do *not* need to be repeated anywhere in the main text.

>> So, if authors can't even be bothered to properly *write* the text -
>> knowing that the time for writing it will only be spent once,

> This claim is startling.  How much formal writing do you do?

I do it for a living.  Right now.

> A significant portion of the time spent in the crafting of any serious
> document is in revision.

Absolutely.  My current average for documenting a single function
in a C library is two hours of working time.  Time spent on reading
and analyzing the related code and time spent on revision of the
descriptions are of the same order of magnitude, i guess.

But i only spend those two hours once.  Maybe some more time may
be added to the bill if a way is found to improve it further, or if
those pesky coders keep revising the API.  But whatever version we
are talking about: it gets written once and read many times.

> Even if you didn't think this was true in general, it's plainly
> true of _me_ as any look at groff's commit history
> will attest.

I do think it is true in general, and taking shortcuts during the
arduous process of revising the text is among the most common
reasons why documentation ends up being wordy and awkward.

> They will read it if it effectively communicates what they want to know.
> One measure of effectiveness is how swiftly they can get in and get out
> without the mental effort of maintaining a lot of state; that is, having
> to chase a long chain of cross-references to get to the one unique place
> where a fact is recorded in Bibliotecha Schwarze.

Absolutely.  While providing cross-references that might help in some
situations is valuable, avoiding cross-references that would have to
be followed by next to every reader is important.  But that doesn't
mean duplicating substantial amounts of text is OK.  That ends up
having users compare text to figure out what the differences are.
Your example is so bad that people will be tempted to use diff(1)
to cope with it.

> Another factor in readability is the hedonic benefit of experiencing
> the prose.  This is a highly subjective factor, and a virtue too often
> absent from technical literature, which is why it has a reputation as
> dry and boring.  But it also explains why the most successful works in
> this discipline endure--because the writer is a talented stylist,
> has an agreeable tone and/or sense of humor.  Once I learned enough math
> to comprehend portions of Knuth's _Art of Computer Programming_ I was
> surprised--though I should not have been--at how lucid and fun he was to
> read.  I suspect that were you to take your editorial approach to his
> work, it would swiftly become unrecognizable.

I don't doubt that, and i concede that the Steve Hensons of this
world are more numerous than the Donald Knuths.  But please,
correctness, completeness, and conciseness are fundamental, critical
requirements.  If somebody is able to make the text endearing to
read on top of that, all the better.  But artistic finesse is not
very valuable when the fundamental goals are being missed.
And all that has nothing to do with making the reader read the same
text twice.

Also, in programming, it would seem more natural to me to search
for hedonic benefit in reading *code* rather than reading
*documentation* - and in that field, the Steve Hensons enjoy a
similar numerical advantage as in documentation.  ;-(

>> Yes, the current disaster with groff_man(7) / groff_man_style(7)
>> should be fixed at some point after release.  I think Branden
>> probably didn't intend it to stay this way,

> You guess wrong.  It's pretty close now to where I intended it, and I'm
> far happier with it now than I was after groff 1.22.4.  Amusingly
> enough, it was in part at _your_ insistence that I restructured it as it
> is, with a parent document that uses m4 to generate (1) a reference
> page, whose importance you emphasized, and (2) a pedagogical document
> for man page writers who care nothing about typesetting or any feature
> not directly relevant to their man(7) endeavors.

I'm not so surprised here.  We do share lots of goals and even lots
of ideas how to reach them, even though we disagree in a number of
respects now and then.

>> but did it as an intermediate step in the complex task of
>> disentangling a large and complicated page into two logically separate
>> parts.

> I think the document will continue to evolve, namely to incorporate an
> introduction covering "filling" and "breaking" and other concepts that
> are alien to plain-text-only users.

Explaining that a bit, in a more accessible way then the roff reference
manual explains .fi and .ad and .br and the like, and add advice how to
control filling and breaking portably and readably in a manual page,
does seem useful in groff_man_style(7).

It won't be the same text as anywhere else because it serves different
readers for a different purpose.  Nobody reading the roff reference
manual will have to fear missing anything by not looking at
groff_man_style(7).  Nobody reading groff_man_style(7) will have
to fear missing anything *relevant* by not looking at the roff
reference manual.

But groff_man(7) and groff_man_style(7) target exactly the same
audience for exactly the same purpose.  So they should not contain
large amounts of duplicate text.

Yours,
  Ingo

Re: duplicate documentation, was: Release Candidate

Reply via email to