Re: Zero Width Space (was Re: How to print a literal '.' as the first character in a line?)

Ingo Schwarze Sat, 04 Jun 2022 15:34:56 -0700

Hi,

James K. Lowden wrote on Sat, Jun 04, 2022 at 03:23:36PM -0400:
> On Thu, 5 May 2022 03:40:27 -0500 Dave Kemper wrote:


>> To cite the example that originally launched this thread, the old
>> docs termed the \& a "zero width space," which Branden has changed to
>> the "non-printing input break."  It may not roll off the tongue as
>> easily, but it's more precise and descriptive about what the escape
>> does: it affects how input is parsed, not how output is rendered.
>> It's not kin to other space escapes like \~ or \|, as the original
>> term implied.

> I disagree.  That's not what it does.  
> 
> The zero width space does not "affect how input is parsed".  It's
> parsed like all other input -- indeed, exactly like \| and \~.  Its only
> distinction from them is on output.  
> 
> To insert \& at the start of a line does not affect how the input is
> parsed.  *Any* character before a leading dot prevents the dot from
> being interpreted as a request.  The salient difference is that \&
> introduces nothing into the output stream.  Hence, "zero width".  
> 
> To me, the term "non-printing input break" verges on nonsense because
> it suggests there might be such a thing as "printing input".  There is
> not: input is processed and rendered as output.  Input is no
> more printed than it is written to the keyboard.  
> 
> I humbly suggest on this point we return to status quo ante.  A "zero
> width space" is perfectly clear terminology.  The fact that \& is used
> occasionally to prevent non-requests from being interpreted as requests
> is incidental, easily explained and understood.  Does anyone remember
> being confused by it?   I don't.  

James' argument makes sense to me.

On top of that, the groff documentation uses the term "break" very
consistently, defining it as starting a new output line even though
the current output line is not yet full.  In a few places, the term
"line break" is used as a synonym of "break", which, in my humble
opinion, is accurate and does not cause confusion.

The escape sequence "\:" is called "non-printing break point",
the "'" control character "no-break control character", which
both agree well with the way "break" is used.

Occasionally, a break is qualified as a "page break" or "column break"
which is fine in so far as every page break and every column break
implies an (output line) break in the usual sense, too.

The groff documentation also uses the term "non-breaking" =
"unbreakable" consistently, meaning that a break will not be
inserted at the place in question.

In very few cases, the groff documentation uses the term "break the
input line" to mean "start a new input line".  There is a small risk
that might cause confusion with "breaks" in the normal sense, but i
see no general way to avoid that risk.  In any case, all such places
i saw clearly use the qualifier "input", so careful readers should
not get confused.

So, to summarize, groff documentation consistently uses the word "break"
for "line break", almost always in the sense of output line break
and in a few clearly qualified cases for "input line break".

>From this perspective, it is indeed unfortunate terminology to
call \& a "non-printing input break" because it has no relation
whatsoever to breaking the input line, nor to a "break" in the
general sense, i.e. breaking the output line.

I do realize the change was committed on Sat Aug 15 22:08:01 2020,
nearly two years ago, but when issues aren't noticed soon, finding
them later is still better than never.

More constructively, how *should* it be called?

In all ways i'm aware of, it behaves exactly like a horizontal
spacing escape sequence (except that its width is zero) and
exactly like a character (except that it prints an empty glyph
of witdh zero).  So both "zero-width space" and "non-printing
zero-width character" would seem accurate to me.  The former has
the advantage of being shorter and agreeing with traditional
terminology.  It's slightly unfortunate that Unicode uses the
character name "ZERO WIDTH SPACE" for what groff (more
appropriately) calls the "non-printing break point" (\:),
but i would consider consistency within the roff domain more
important than using the same terms as Unicode.  Consequently,
i'm *not* advocating calling \& a "zero-width non-joiner" or
a "zero width no-break space" even though both would be more
precise if we were aiming for Unicode-compatible terminology.
Then again, if people worry a lot about U+200B, then calling it
a "zero width no-break space" is still much better than calling
it some kind of a "break".

The argument "it is not a space because it doesn't move and it is
not a character because it doesn't print anything" reminds me a bit
of the argument "0 is not a number because there is nothing there",
yet mathematicians certainly call it a number all the same because
zero can be used in the same way ways as a number.  The reason why \&
works for the escaping purposes it is used for is quite similar: it
is treated as if it were a space or character except that it doesn't
print nor move.  In all these cases, you can do the same escaping with
some other spacing escape sequence or with some other character if you
don't object to moving or printing a bit.

So, i'd say, let's call it a day (err, a space).  It certainly
is *not* a break in any of the senses familiar from the groff
documentation.

Yours,
  Ingo

P.S.
Note that i'm not saying Branden is making our documentation worse,
quite to the contrary.  This looks like an ususual slip to me.

Re: Zero Width Space (was Re: How to print a literal '.' as the first character in a line?)

Reply via email to