At 2025-02-23T20:22:38+0100, onf wrote:
> On Sun Feb 23, 2025 at 5:23 PM CET, Benno Schulenberg wrote:
> > > Regarding the hyphenation problem, the lump in the carpet moves
> > > after groff 1.22.4.  Because HTML output now produces −
> > > entities rather than hyphens from *roff minus special characters
> > > (`\-`), [...]
> >
> > Ouch!  I don't like the look of them on the HTML page: they are
> > "enormous", looking like ndashes.  And what's worse: when one
> > copy-pastes such an option (−−breaklonglines, for example) to
> > a terminal, nano thinks it is a file name.  Aarrr!  :(
> 
> Sounds like you want to get a regular hyphen in the output, not a
> minus or dash. That has a very simple solution: use an actual hyphen
> (-) instead of the special character \-, which means minus.

No, that will produce incorrect output on other output devices.  When
people want to copy and paste examples from rendered man page documents
to a shell prompt, for example, they typically want not a hyphen, not a
minus, not an em dash, not an en dash, nor any other sort of dash, but
the "hyphen-minus", the invention of the committee that produced
USAS X3.4-1968.

The situation is complex, not simple.

See, for example:

https://lwn.net/Articles/947941/

...the bug that led to the article....

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1041731

...our advice in groff_man_style(7)...

     • Some ASCII characters look funny or copy and paste wrong.

       On devices with large glyph repertoires, like UTF‐8‐capable
       terminals and PDF, GNU troff, like AT&T troff before it, maps
       several keycaps to code points outside the Unicode basic Latin
       range (historically “ASCII”) because that usually results in
       better typography in the general case.  When documenting
       GNU/Linux command or C language syntax, however, this translation
       is sometimes not desirable.

       To get a “literal”...   ...should be input.
       ────────────────────────────────────────────
                           '   \(aq
                           -   \-
                           \   \(rs
                           ^   \(ha
                           `   \(ga
                           ~   \(ti
       ────────────────────────────────────────────

       Additionally, if a neutral double quote (") is needed in a macro
       argument, you can use \(dq to get it.  You should not use \(aq
       for an ordinary apostrophe (as in “can’t”) or \- for an ordinary
       hyphen (as in “word‐aligned”).  Review subsection “Portability”
       above.

and further background in groff_char(7).

     All of the following characters map to glyphs as you would expect.
       ┌───────────────────────────────────────────────────────────┐
       │ ! # $ % & ( ) * + , . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ │
       │ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ ] _ │
       │ a b c d e f g h i j k l m n o p q r s t u v w x y z { | } │
       └───────────────────────────────────────────────────────────┘
     The remaining ordinary characters surprise computing professionals
     and others intimately familiar with the ISO character encodings.
     The developers of AT&T troff chose mappings for them that would be
     useful for typesetting technical literature in a broad range of
     scientific disciplines: Bell Labs used the system to prepare AT&T’s
     patent filings with the U.S. government.  Further, the prevailing
     character encoding standard in the 1970s, USAS X3.4‐1968 (ASCII),
     deliberately supported semantic ambiguity at some code points, and
     outright substitution at several others, to suit the localization
     demands of various national standards bodies.

     The table below presents the seven exceptional code points with
     their typical keycap engravings, their glyph mappings and semantics
     in roff systems, and the escape sequences producing the Unicode
     basic Latin character they replace. ... On devices with a limited
     glyph repertoire, glyphs in the “keycap” and “appearance” columns
     on the same row of the table may look identical; except for the
     neutral double quote, this will not be the case on more‐capable
     devices.  Review your document using as many different output
     devices as possible.

   ┌───────────────────────────────────────────────────────────────────┐
   │ Keycap   Appearance and meaning   Special character and meaning   │
   ├───────────────────────────────────────────────────────────────────┤
   │ "        " neutral double quote   \[dq] neutral double quote      │
   │ '        ’ closing single quote   \[aq] neutral apostrophe        │
   │ -        ‐ hyphen                 \- or \[-] minus sign/Unix dash │
   │ \        (escape character)       \e or \[rs] reverse solidus     │
   │ ^        ˆ modifier circumflex    \[ha] circumflex/caret/“hat”    │
   │ `        ‘ opening single quote   \(ga grave accent               │
   │ ~        ˜ modifier tilde         \[ti] tilde                     │
   └───────────────────────────────────────────────────────────────────┘

     The hyphen‐minus is a particularly unfortunate case of overloading.
     Its awkward name in ISO 8859 and later standards reflects the many
     distinguishable purposes to which it had already been put by the
     1980s, including a hyphen, a minus sign, and (alone or in
     repetition) dashes of varying widths.  For best results in roff
     systems, use the “-” character in input outside an escape sequence
     only to mean a hyphen, as in the phrase “long‐term”.  For a minus
     sign in running text or a Unix file name or command‐line option
     dash, use \- (or \[-] in groff if you find it helps the clarity of
     the source document).  (Another minus sign, for use in mathematical
     expressions, is available as \(mi.)  AT&T troff supported em‐dashes
     as \(em, as does groff.

The man(7) and mdoc(7) remap the `\-` special character escape sequence
to a hyphen-minus because the ambiguous hyphen-minus character is
demanded overwhelmingly more often in that documentary domain.  In other
typesetting applications, that is not necessarily true, and so groff
does not perform such a remapping by default.

Regards,
Branden

Attachment: signature.asc
Description: PGP signature

Reply via email to