Re: breaks and the no-break control character

G. Branden Robinson Fri, 05 May 2023 11:25:33 -0700

At 2023-05-04T20:21:01-0500, Dave Kemper wrote:
> Discussion about the request invocation 'br in
> http://savannah.gnu.org/bugs/?62776#comment14 made me also wonder
> about 'sp, another seemingly paradoxical invocation.  But like 'br,
> this seems clearly enough documented in the info manual:
> 
> "Several other requests imply breaks: ... 'sp', ....  If the no-break
> control character is used with any of these requests, GNU 'troff'
> suppresses the break; instead the requested operation takes effect at
> the next break."
> 
> However, the actual effect is perhaps not quite what one would expect
> from reading this:
> 
> $ cat sp.test
> Foo.
> .sp 2
> Bar.
> .pl \n(nlu
> $ nroff -ww sp.test
> Foo.
> 
> 
> Bar.
> $ sed "s/^\\.sp/'sp/" sp.test | nroff -ww
> 
> 
> Foo.  Bar.
> $


Yes.  Forgetting that you didn't post this as a ticket comment and
unable to find it, I came up with my own very similar reproducer.

$ cat sp-with-and-without-break.roff
foo
.sp 2
bar
.br
baz
'sp 2
qux
.pl \n(nlu
$ DWBHOME=. ./bin/nroff sp-with-and-without-break.roff 
foo


bar


baz qux
$

As you might have guessed, that's DWB 3.3 nroff output.  groff's is the
same.

> The first run produces unsurprising results.  What the second seems to
> show is that groff detects a break (presumably upon EOF, since .pl
> doesn't cause one), then processes the queued sp request, and only
> THEN flushes its pending output.  This is the opposite of what I would
> have assumed upon reading "the requested operation takes effect at the
> next break," since in my mental conception, part of what constitutes a
> "break" is outputting pending text.  But I suppose technically "at the
> next break" is ambiguous about whether any queued request(s) happen(s)
> before or after the break itself.
> 
> Does this behavior surprise anyone else?  (Heirloom troff does the
> same thing.)  Does anyone else think these sentences in the manual
> don't capture the nuance of the situation?  And does that even matter,
> given that "'sp" is kind of a weird thing to say anyway?

It _did_ surprise me.  My first inclination was to propose yet another
radical reform, because my mental model was that because breaking
implies moving the drawing position down by one vee anyway (and
performing the device's equivalent of a carriage return), then "'sp N"
should move the drawing position down by N-1 vees.  This would break
compatibility with other troffs, and with respect to such a fundamental
formatting operation (albeit a dusty corner of _syntax_) that even I
hesitated.

But there may be an easier way to explain/rationalize this.

We already say this in our Texinfo manual:

  Output line properties like page offset, indentation, and adjustment
  are not determined until the line has been broken.

(...and someday we should say it in roff(7) or groff(7), too.)

We can assert that the position of the text baseline is one of those
properties of the output line that is not determined until the line has
been broken.  And what the `sp` request really does is decide where your
next text baseline is going to be.

This even begins to make sense to me when I consider that a common
operation in non-man-page typesetting is the placement of superscripts
and subscripts.  In mathematical typography particularly, these can be
chained.  The me(7) package has explicit support for growing the height
of the output line--in other words, moving the text baseline down--to
accommodate the placement of glyphs that would crowd the ordinary 1v of
distance between text baselines.  And further, the "height of a text
line" is a formally complex piece of machinery, with several different
elements that ultimately construct it.

Buried under the description of `vs` request in our Texinfo manual is
the following disquisition.

  When a break occurs, GNU 'troff' performs the following procedure.

     * Move the drawing position vertically by the "extra pre-vertical
       line space", the minimum of all negative '\x' escape sequence
       arguments in the pending output line.

     * Move the drawing position vertically by the vertical line
       spacing.

     * Write out the pending output line.

     * Move the drawing position vertically by the "extra post-vertical
       line space", the maximum of all positive '\x' escape sequence
       arguments in the line that has just been output.

     * Move the drawing position vertically by the "post-vertical line
       spacing" (see below).

     Prefer 'vs' or 'pvs' over 'ls' to produce double-spaced documents.
  'vs' and 'pvs' have finer granularity than 'ls'; moreover, some
  preprocessors assume single spacing.  *Note Manipulating Spacing::,
  regarding the '\x' escape sequence and the 'ls' request.

We seem to have an ugly bit of non-orthogonality in this area.

quantity                                register access

extra pre-vertical line spacing         n/a
vertical (line) spacing                 .v
extra post-vertical line spacing        .a
post-vertical line spacing              .pvs

Also I'm not convinced of the accessibility of the terminology in use.

If I want to know within an output line how much my line's already been
jacked by some tetration-wielding naïf who hasn't learned Knuth's
up-arrow notation, apparently in groff I'm SOL.

Hmm, in DWB 3.3 troff too.  Apparently `.a` is not updated until _after_
the line is output.

Okay, I think I see how this was supposed to work now.  Since the
addition of pre- and post- line spacing with \x was already using
saturating arithmetic[1] in both directions (above = negative, below =
positive), you didn't really _need_ to be able to access these
quantities on the same output line; just demand what you needed with
`\x` and let the formatter do its thing.

_But_, when typesetting adjacent lines that resort to extra line
spacing, then extra post-line spacing on line N might mean that you
don't need as much, or even _any_, or line N+1: you will already have
"headroom" for your tetrated superscripts or giant brackets, assuming
the glyphs from the two different lines don't outright collide.[2]

Thanks for the trip down the rabbit hole, Dave--I'll have my revenge. ;)

More seriously, our documentation should do more to motivate the
existence of these features.

Regards,
Branden

[1] possibly an abuse of terminology
[2] which we should be able to detect and warn about since we know where
    all the glyph bounding boxes are on the page[3]
[3] On the gripping hand I've heard that collision detection is not
    necessarily a trivial problem--I'm not sure; I never did video games
    or simulators for robots.  I think I had something isomorphic to
    this problem in an interview once.  I remember it involved
    unbounded resolution (i.e., there was no limit to how finely you
    could partition the space to look for intersections/collisions, so
    there was no brute-force approach).  After drawing some sketches and
    describing in limited terms my grasp of binary space partitioning, I
    ventured my suspicion that it was intractable or an open problem in
    computer science, and didn't get the job.  If I'd been more glib
    maybe I'd have said something about Hilbert curves.

signature.asc
Description: PGP signature

Re: breaks and the no-break control character

Reply via email to