1.23: UTF-8 device: more display oddities
Hello. Letting aside the hyphen-minus -> hyphen thing that i fixed for me locally, there is also the problem that ` U+0060, GRAVE ACCENT, "backtick" is displayed as ‘ U+2018, LEFT SINGLE QUOTATION MARK which in Liberation Mono (at least!) this reverses the direction of the tick. I was looking at a manual which uses backtick syntax notation for sh(1)ell commands (aka i=`echo one`, not new-style i=$(echo one)), and it _really_ looks strange. Could be done something about this, please? Thank you. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: 1.23: UTF-8 device: more display oddities
Steffen Nurpmeso wrote in <20220916213112.5dabw%stef...@sdaoden.eu>: |Hello. | |Letting aside the hyphen-minus -> hyphen thing that i fixed for me |locally, there is also the problem that | | ` U+0060, GRAVE ACCENT, "backtick" | |is displayed as | | ‘ U+2018, LEFT SINGLE QUOTATION MARK Also ~ U+007E, TILDE is displayed as ˜ 02DC, SMALL TILDE which here sits at the height of an accent here, for example the ^ 005E, CIRCUMFLEX ACCENT Putting it all together it really looks totally odd here: i=`echo '~/home^run'` becomes i=‘‘echo ’˜/homeˆrun’‘’ How is anyone supposed to document a sh(1)ell-style manual with mdoc(7) (i do not know about man(7)) with these settings? --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: 1.23: UTF-8 device: more display oddities
At 2022-09-16T23:56:58+0200, Steffen Nurpmeso wrote: > |Letting aside the hyphen-minus -> hyphen thing that i fixed for me > |locally, there is also the problem that > | > | ` U+0060, GRAVE ACCENT, "backtick" > | > |is displayed as > | > | ‘ U+2018, LEFT SINGLE QUOTATION MARK > > Also > > ~ U+007E, TILDE > > is displayed as > > ˜ 02DC, SMALL TILDE > > which here sits at the height of an accent here, for example the > > ^ 005E, CIRCUMFLEX ACCENT > > Putting it all together it really looks totally odd here: > > i=`echo '~/home^run'` > > becomes > > i=‘‘echo ’˜/homeˆrun’‘’ > > How is anyone supposed to document a sh(1)ell-style manual with > mdoc(7) (i do not know about man(7)) with these settings? By reading the manual, Steffen. UTF-8 content follows. groff_char(7): ... On ISO systems, code points in the range 33–126 comprise a common set of printable glyphs in all of the aforementioned ISO character encoding standards. It is this character set and (with some noteworthy exceptions) the corresponding glyph repertoire for which AT&T troff was implemented. ... The table below presents the seven exceptional code points with their typical keycap engravings, their glyph mappings and semantics in roff systems, and the escape sequences producing the Unicode basic Latin character they replace. The first, the neutral double quote, is a partial exception because it does represent itself, but since the roff language also uses it to quote macro arguments, groff supports a special character escape sequence as an alternative form so that the glyph can be easily included in macro arguments without requiring the user to master the quoting rules that AT&T troff required in that context. (Some requests, like ds, also treat " non‐literally.) Furthermore, not all of the special character escape sequences are portable to AT&T troff and all of its descendants; these groff extensions are presented using its special character form \[], whereas portable special character escape sequences are shown in the traditional \( form. \- and \e are portable to all known troffs. \e means “the glyph of the current escape character”; it therefore can produce unexpected output if the ec request is used. On devices with a limited glyph repertoire, glyphs in the “keycap” and “appearance” columns on the same row of the table may look identical; except for the neutral double quote, this will not be the case on more‐capable devices. Review your document using as many different output devices as possible. ┌──┐ │Keycap Appearance and meaning Special character and meaning │ ├──┤ │"" neutral double quote \[dq] neutral double quote │ │'’ closing single quote \[aq] neutral apostrophe│ │-‐ hyphen \- or \[-] minus sign/Unix dash │ │\(escape character) \e or \[rs] reverse solidus │ │^ˆ modifier circumflex\(ha circumflex/caret/“hat” │ │`‘ opening single quote \(ga grave accent │ │~˜ modifier tilde \(ti tilde │ └──┘ There is also the "Portability" section of groff_man(7) [groff 1.22.4] or groff_man_style(7) [groff 1.23]. Several special characters are also widely portable. AT&T troff did not define the reverse solidus or quotation characters listed below, but any of its descendants, like Plan 9 or Solaris troff, can support them by defining their glyphs in font description files; see groff_font(5). \- Minus sign or basic Latin hyphen‐minus. This escape sequence produces the Unix command‐line option dash in the output. “-” is a hyphen in the roff language; some output devices replace it with U+2010 (hyphen) or similar. \(aq Basic Latin neutral apostrophe. Some output devices replace “'” with a right single quotation mark. \(oq \(cq Opening (left) and closing (right) single quotation marks. Use these for paired directional single quotes, ‘like this’. \(dq Basic Latin quotation mark (double quote). Use in macro calls to prevent ‘"” from being interpreted as beginning a quoted argument, or simply for readability. .TP .BI "split \(dq" text \(dq \(lq \(rq Left and right double quotation marks. Use these for paired directional double quotes, “like this”.
Re: 1.23: UTF-8 device: more display oddities
G. Branden Robinson wrote in <20220916223236.lmkf3brdwotdn2fd@illithid>: |At 2022-09-16T23:56:58+0200, Steffen Nurpmeso wrote: .. |> i=`echo '~/home^run'` |> |> becomes |> |> i=‘‘echo ’˜/homeˆrun’‘’ |> |> How is anyone supposed to document a sh(1)ell-style manual with |> mdoc(7) (i do not know about man(7)) with these settings? | |By reading the manual, Steffen. Ok, and you put a lot of effort in it in the last years. But the point is: last week it looked _entirely_ different, and the locale has not changed! The manual has not changed either. Just to remind you that the hyphen-minus -> hyphen change was commited in March _this_ year. So it you -- you are changing things backward incompatibly! |UTF-8 content follows. | |groff_char(7): ... Please note again i am doing mdoc(7) here, not mom or ms or my own macros. |There is also the "Portability" section of groff_man(7) [groff 1.22.4] |or groff_man_style(7) [groff 1.23]. | | Several special characters are also widely portable. AT&T troff ... But there is nothing special. Input characters are mapped away differently than before. ... | \(ha Basic Latin circumflex accent (“hat”). Some output | devices replace “^” with U+02C6 (modifier letter | circumflex accent) or similar. ... | \(ti Basic Latin tilde. Some output devices replace “~” with | U+02DC (small tilde) or similar. But why? And furthermore: why -Tutf8 that lives on and with fixed-width monospace fonts in practically all cases. And why differently than before? |Or you can just do the brute force thing. From groff 1.23's "PROBLEMS" |file: But this changes manuals written over the last decades to something completely different, Branden. I am coming from 1.22.3. It looked entirely different last week. You cannot expect all those people to rewrite all their manuals because you feel like mapping monospace -Tutf8 to be en par with -Tpdf with all its font powers (used or not)? I really do not understand these decisions. Please note also mandoc (at least the version i have here) renders it the way i _expect_. Maybe there is a reason why now also Apple i think switches away from groff to mandoc? ... |* When viewing man pages, some characters on my UTF-8 terminal emulator | look funny or copy-and-paste wrong. Why? | |Some Unicode Basic Latin ("ASCII") input characters are mapped to |non-Basic Latin code points in output for consistency with other output |devices, like PDF. See groff_man_style(7) and groff_char(7) for correct ... Uh! ... |However, many man pages are written in ignorance of the correct special |characters to obtain the desired glyphs. You can conceal these errors Heh! _Exactly_! ... |by adding the following to your site-local man(7) configuration. The |file is called "man.local"; its installation directory depends on how |groff was configured when it was built. | |--- start --- |.if '\*[.T]'utf8' \{\ |. char ' \[aq] |. char - \- |. char ^ \[ha] |. char ` \[ga] |. char ~ \[ti] |.\} You know, if you would provide a commented-out setting to change the decade old default behaviour to what you feel is more modern, or "better", _then_ i could understand it. I mean i produce backward incompatible changes myself all the time, but i give plenty of hints. For example $
Re: 1.23: UTF-8 device: more display oddities
At 2022-09-17T01:00:26+0200, Steffen Nurpmeso wrote: > G. Branden Robinson wrote in > <20220916223236.lmkf3brdwotdn2fd@illithid>: > |At 2022-09-16T23:56:58+0200, Steffen Nurpmeso wrote: > |> How is anyone supposed to document a sh(1)ell-style manual with > |> mdoc(7) (i do not know about man(7)) with these settings? > | > |By reading the manual, Steffen. > > Ok, and you put a lot of effort in it in the last years. I'd feel more appreciated if I saw more evidence of you reading it. > But the point is: last week it looked _entirely_ different, You chose last week to upgrade from a nearly eight year-old release.[1] Did you read groff's NEWS file? > and the locale has not changed! The manual has not changed either. I know for a fact that "the manual" has changed substantially since groff 1.22.3. I did a significant amount of work on groff documentation prior to the 1.22.4 release. Are you referring to some other manual? > Just to remind you that the hyphen-minus -> hyphen change was commited > in March _this_ year. Yes. After I spent 2+ years advocating it on this mailing list and, as a small portion of my work, reviewing groff's own ~60 man pages for correct glyph usage. > So it you -- you are changing things backward incompatibly! No, I am aligning things more closely between typesetters and terminal devices, to reflect the increasing capabilities of terminal devices on Unix systems since about the year 2000. You can restore man pages to the appearance you desire by using the same character encodings you did when you become accustomed to them: ASCII or ISO Latin-1. Yes, even using bleeding edge groff Git HEAD to format them. > |UTF-8 content follows. > | > |groff_char(7): > ... > > Please note again i am doing mdoc(7) here, not mom or ms or my own > macros. Using mdoc(7) is no reason not to read groff_char(7). mdoc(7) is a groff macro package. It does not alter the syntax or repertoire of groff special characters. > |There is also the "Portability" section of groff_man(7) [groff > |1.22.4] or groff_man_style(7) [groff 1.23]. > | > | Several special characters are also widely portable. AT&T > | troff > ... > > But there is nothing special. "Special character" is a piece of *roff terminology. It is startling to me that you are not already aware of this. If you'd take a moment to refrain from your multiple expostulations of "WOW!!!", catch your breath, and oxygenate your brain sufficiently to read the groff_char(7) man page, you might learn this. > Input characters are mapped away differently than before. See above. > ... > | \(ha Basic Latin circumflex accent (“hat”). Some output > | devices replace “^” with U+02C6 (modifier letter > | circumflex accent) or similar. > ... > | \(ti Basic Latin tilde. Some output devices replace “~” with > | U+02DC (small tilde) or similar. > > But why? Why what? Why do "some devices replace"...? That's Ingo's wording, if I recall correctly, but the reason is that some output devices have larger glyph repertoires than others. This observation has been commonplace to *roff users at least since Typesetter roff was written in about 1972. I don't think I'd use the term "replace"; every *roff output device defines a mapping from characters to glyphs. In this sense, every character gets "replaced". Maybe I'll adjust that wording. > And furthermore: why -Tutf8 that lives on and with fixed-width > monospace fonts in practically all cases. I cannot parse this. Please try to express yourself in standard English. > And why differently than before? See above. > |Or you can just do the brute force thing. From groff 1.23's > |"PROBLEMS" file: > > But this changes manuals written over the last decades to > something completely different, Branden. Not correctly written man pages. > I am coming from 1.22.3. It looked entirely different last week. You said this already. > You cannot expect all those people to rewrite all their manuals https://www.medicalnewstoday.com/articles/320844 I predict the level of effort for most pages to be minimal (some may not require revision at all), and speaking as someone who has undertaken a multi-year project to _rewrite documentation_ for groff specifically, I am thoroughly persuaded that fixing glyph usage errors in man pages is among the easiest revisions of documentation that a person can undertake. If you find this task too daunting, then I cannot help but anticipate that much more significant flaws in your documentation will go unaddressed. The presence of incorrect glyphs is likely to frustrate copy-and-paste operations, or look mildly strange, but is not, in most cases, going to be a significant barrier to people trying to apply man pages because in every case, ASCII glyphs are _easier to type_. In any event I suspect most man pages will get fixed, if at all, because readers will report bugs. I've met too