At 2025-02-23T20:22:38+0100, onf wrote: > On Sun Feb 23, 2025 at 5:23 PM CET, Benno Schulenberg wrote: > > > Regarding the hyphenation problem, the lump in the carpet moves > > > after groff 1.22.4. Because HTML output now produces − > > > entities rather than hyphens from *roff minus special characters > > > (`\-`), [...] > > > > Ouch! I don't like the look of them on the HTML page: they are > > "enormous", looking like ndashes. And what's worse: when one > > copy-pastes such an option (−−breaklonglines, for example) to > > a terminal, nano thinks it is a file name. Aarrr! :( > > Sounds like you want to get a regular hyphen in the output, not a > minus or dash. That has a very simple solution: use an actual hyphen > (-) instead of the special character \-, which means minus.
No, that will produce incorrect output on other output devices. When people want to copy and paste examples from rendered man page documents to a shell prompt, for example, they typically want not a hyphen, not a minus, not an em dash, not an en dash, nor any other sort of dash, but the "hyphen-minus", the invention of the committee that produced USAS X3.4-1968. The situation is complex, not simple. See, for example: https://lwn.net/Articles/947941/ ...the bug that led to the article.... https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1041731 ...our advice in groff_man_style(7)... • Some ASCII characters look funny or copy and paste wrong. On devices with large glyph repertoires, like UTF‐8‐capable terminals and PDF, GNU troff, like AT&T troff before it, maps several keycaps to code points outside the Unicode basic Latin range (historically “ASCII”) because that usually results in better typography in the general case. When documenting GNU/Linux command or C language syntax, however, this translation is sometimes not desirable. To get a “literal”... ...should be input. ──────────────────────────────────────────── ' \(aq - \- \ \(rs ^ \(ha ` \(ga ~ \(ti ──────────────────────────────────────────── Additionally, if a neutral double quote (") is needed in a macro argument, you can use \(dq to get it. You should not use \(aq for an ordinary apostrophe (as in “can’t”) or \- for an ordinary hyphen (as in “word‐aligned”). Review subsection “Portability” above. and further background in groff_char(7). All of the following characters map to glyphs as you would expect. ┌───────────────────────────────────────────────────────────┐ │ ! # $ % & ( ) * + , . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ │ │ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ ] _ │ │ a b c d e f g h i j k l m n o p q r s t u v w x y z { | } │ └───────────────────────────────────────────────────────────┘ The remaining ordinary characters surprise computing professionals and others intimately familiar with the ISO character encodings. The developers of AT&T troff chose mappings for them that would be useful for typesetting technical literature in a broad range of scientific disciplines: Bell Labs used the system to prepare AT&T’s patent filings with the U.S. government. Further, the prevailing character encoding standard in the 1970s, USAS X3.4‐1968 (ASCII), deliberately supported semantic ambiguity at some code points, and outright substitution at several others, to suit the localization demands of various national standards bodies. The table below presents the seven exceptional code points with their typical keycap engravings, their glyph mappings and semantics in roff systems, and the escape sequences producing the Unicode basic Latin character they replace. ... On devices with a limited glyph repertoire, glyphs in the “keycap” and “appearance” columns on the same row of the table may look identical; except for the neutral double quote, this will not be the case on more‐capable devices. Review your document using as many different output devices as possible. ┌───────────────────────────────────────────────────────────────────┐ │ Keycap Appearance and meaning Special character and meaning │ ├───────────────────────────────────────────────────────────────────┤ │ " " neutral double quote \[dq] neutral double quote │ │ ' ’ closing single quote \[aq] neutral apostrophe │ │ - ‐ hyphen \- or \[-] minus sign/Unix dash │ │ \ (escape character) \e or \[rs] reverse solidus │ │ ^ ˆ modifier circumflex \[ha] circumflex/caret/“hat” │ │ ` ‘ opening single quote \(ga grave accent │ │ ~ ˜ modifier tilde \[ti] tilde │ └───────────────────────────────────────────────────────────────────┘ The hyphen‐minus is a particularly unfortunate case of overloading. Its awkward name in ISO 8859 and later standards reflects the many distinguishable purposes to which it had already been put by the 1980s, including a hyphen, a minus sign, and (alone or in repetition) dashes of varying widths. For best results in roff systems, use the “-” character in input outside an escape sequence only to mean a hyphen, as in the phrase “long‐term”. For a minus sign in running text or a Unix file name or command‐line option dash, use \- (or \[-] in groff if you find it helps the clarity of the source document). (Another minus sign, for use in mathematical expressions, is available as \(mi.) AT&T troff supported em‐dashes as \(em, as does groff. The man(7) and mdoc(7) remap the `\-` special character escape sequence to a hyphen-minus because the ambiguous hyphen-minus character is demanded overwhelmingly more often in that documentary domain. In other typesetting applications, that is not necessarily true, and so groff does not perform such a remapping by default. Regards, Branden
signature.asc
Description: PGP signature