1.23: UTF-8 device: more display oddities

2022-09-16 Thread Steffen Nurpmeso
Hello.

Letting aside the hyphen-minus -> hyphen thing that i fixed for me
locally, there is also the problem that

  ` U+0060, GRAVE ACCENT, "backtick"

is displayed as

  ‘ U+2018, LEFT SINGLE QUOTATION MARK

which in Liberation Mono (at least!) this reverses the direction
of the tick.

I was looking at a manual which uses backtick syntax notation for
sh(1)ell commands (aka i=`echo one`, not new-style i=$(echo one)),
and it _really_ looks strange.

Could be done something about this, please?
Thank you.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: 1.23: UTF-8 device: more display oddities

2022-09-16 Thread Steffen Nurpmeso
Steffen Nurpmeso wrote in
 <20220916213112.5dabw%stef...@sdaoden.eu>:
 |Hello.
 |
 |Letting aside the hyphen-minus -> hyphen thing that i fixed for me
 |locally, there is also the problem that
 |
 |  ` U+0060, GRAVE ACCENT, "backtick"
 |
 |is displayed as
 |
 |  ‘ U+2018, LEFT SINGLE QUOTATION MARK

Also

  ~ U+007E, TILDE

is displayed as

  ˜ 02DC, SMALL TILDE

which here sits at the height of an accent here, for example the

  ^ 005E, CIRCUMFLEX ACCENT

Putting it all together it really looks totally odd here:

  i=`echo '~/home^run'`

becomes

  i=‘‘echo ’˜/homeˆrun’‘’

How is anyone supposed to document a sh(1)ell-style manual with
mdoc(7) (i do not know about man(7)) with these settings?

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: 1.23: UTF-8 device: more display oddities

2022-09-16 Thread G. Branden Robinson
At 2022-09-16T23:56:58+0200, Steffen Nurpmeso wrote:
>  |Letting aside the hyphen-minus -> hyphen thing that i fixed for me
>  |locally, there is also the problem that
>  |
>  |  ` U+0060, GRAVE ACCENT, "backtick"
>  |
>  |is displayed as
>  |
>  |  ‘ U+2018, LEFT SINGLE QUOTATION MARK
> 
> Also
> 
>   ~ U+007E, TILDE
> 
> is displayed as
> 
>   ˜ 02DC, SMALL TILDE
> 
> which here sits at the height of an accent here, for example the
> 
>   ^ 005E, CIRCUMFLEX ACCENT
> 
> Putting it all together it really looks totally odd here:
> 
>   i=`echo '~/home^run'`
> 
> becomes
> 
>   i=‘‘echo ’˜/homeˆrun’‘’
> 
> How is anyone supposed to document a sh(1)ell-style manual with
> mdoc(7) (i do not know about man(7)) with these settings?

By reading the manual, Steffen.

UTF-8 content follows.

groff_char(7):
...
   On ISO systems, code points in the range 33–126 comprise a common
   set of printable glyphs in all of the aforementioned ISO
   character encoding standards.  It is this character set and (with
   some noteworthy exceptions) the corresponding glyph repertoire
   for which AT&T troff was implemented.
...
   The table below presents the seven exceptional code points with
   their typical keycap engravings, their glyph mappings and
   semantics in roff systems, and the escape sequences producing the
   Unicode basic Latin character they replace.  The first, the
   neutral double quote, is a partial exception because it does
   represent itself, but since the roff language also uses it to
   quote macro arguments, groff supports a special character escape
   sequence as an alternative form so that the glyph can be easily
   included in macro arguments without requiring the user to master
   the quoting rules that AT&T troff required in that context.
   (Some requests, like ds, also treat " non‐literally.)
   Furthermore, not all of the special character escape sequences
   are portable to AT&T troff and all of its descendants; these
   groff extensions are presented using its special character form
   \[], whereas portable special character escape sequences are
   shown in the traditional \( form.  \- and \e are portable to all
   known troffs.  \e means “the glyph of the current escape
   character”; it therefore can produce unexpected output if the ec
   request is used.  On devices with a limited glyph repertoire,
   glyphs in the “keycap” and “appearance” columns on the same row
   of the table may look identical; except for the neutral double
   quote, this will not be the case on more‐capable devices.  Review
   your document using as many different output devices as possible.

  ┌──┐
  │Keycap   Appearance and meaning   Special character and meaning   │
  ├──┤
  │"" neutral double quote   \[dq] neutral double quote  │
  │'’ closing single quote   \[aq] neutral apostrophe│
  │-‐ hyphen \- or \[-] minus sign/Unix dash │
  │\(escape character)   \e or \[rs] reverse solidus │
  │^ˆ modifier circumflex\(ha circumflex/caret/“hat” │
  │`‘ opening single quote   \(ga grave accent   │
  │~˜ modifier tilde \(ti tilde  │
  └──┘

There is also the "Portability" section of groff_man(7) [groff 1.22.4]
or groff_man_style(7) [groff 1.23].

   Several special characters are also widely portable.  AT&T troff
   did not define the reverse solidus or quotation characters listed
   below, but any of its descendants, like Plan 9 or Solaris troff,
   can support them by defining their glyphs in font description
   files; see groff_font(5).

   \- Minus sign or basic Latin hyphen‐minus.  This escape
  sequence produces the Unix command‐line option dash in the
  output.  “-” is a hyphen in the roff language; some output
  devices replace it with U+2010 (hyphen) or similar.

   \(aq   Basic Latin neutral apostrophe.  Some output devices
  replace “'” with a right single quotation mark.

   \(oq
   \(cq   Opening (left) and closing (right) single quotation marks.
  Use these for paired directional single quotes, ‘like
  this’.

   \(dq   Basic Latin quotation mark (double quote).  Use in macro
  calls to prevent ‘"” from being interpreted as beginning a
  quoted argument, or simply for readability.

 .TP
 .BI "split \(dq" text \(dq

   \(lq
   \(rq   Left and right double quotation marks.  Use these for
  paired directional double quotes, “like this”.

 

Re: 1.23: UTF-8 device: more display oddities

2022-09-16 Thread Steffen Nurpmeso
G. Branden Robinson wrote in
 <20220916223236.lmkf3brdwotdn2fd@illithid>:
 |At 2022-09-16T23:56:58+0200, Steffen Nurpmeso wrote:
 ..
 |>   i=`echo '~/home^run'`
 |> 
 |> becomes
 |> 
 |>   i=‘‘echo ’˜/homeˆrun’‘’
 |> 
 |> How is anyone supposed to document a sh(1)ell-style manual with
 |> mdoc(7) (i do not know about man(7)) with these settings?
 |
 |By reading the manual, Steffen.

Ok, and you put a lot of effort in it in the last years.

But the point is: last week it looked _entirely_ different, and
the locale has not changed!  The manual has not changed either.
Just to remind you that the hyphen-minus -> hyphen change was
commited in March _this_ year.
So it you -- you are changing things backward incompatibly!

 |UTF-8 content follows.
 |
 |groff_char(7):
 ...

Please note again i am doing mdoc(7) here, not mom or ms or my own
macros.

 |There is also the "Portability" section of groff_man(7) [groff 1.22.4]
 |or groff_man_style(7) [groff 1.23].
 |
 |   Several special characters are also widely portable.  AT&T troff
 ...

But there is nothing special.  Input characters are mapped away
differently than before.

  ...
 |   \(ha   Basic Latin circumflex accent (“hat”).  Some output
 |  devices replace “^” with U+02C6 (modifier letter
 |  circumflex accent) or similar.
 ...
 |   \(ti   Basic Latin tilde.  Some output devices replace “~” with
 |  U+02DC (small tilde) or similar.

But why?  And furthermore: why -Tutf8 that lives on and with
fixed-width monospace fonts in practically all cases.  And why
differently than before?

 |Or you can just do the brute force thing.  From groff 1.23's "PROBLEMS"
 |file:

But this changes manuals written over the last decades to
something completely different, Branden.

I am coming from 1.22.3.  It looked entirely different last week.

You cannot expect all those people to rewrite all their manuals
because you feel like mapping monospace -Tutf8 to be en par with
-Tpdf with all its font powers (used or not)?
I really do not understand these decisions.

Please note also mandoc (at least the version i have here) renders
it the way i _expect_.

Maybe there is a reason why now also Apple i think switches away
from groff to mandoc?

 ...
 |* When viewing man pages, some characters on my UTF-8 terminal emulator
 |  look funny or copy-and-paste wrong.  Why?
 |
 |Some Unicode Basic Latin ("ASCII") input characters are mapped to
 |non-Basic Latin code points in output for consistency with other output
 |devices, like PDF.  See groff_man_style(7) and groff_char(7) for correct
 ...

Uh!

  ...
 |However, many man pages are written in ignorance of the correct special
 |characters to obtain the desired glyphs.  You can conceal these errors

Heh!  _Exactly_!

  ...
 |by adding the following to your site-local man(7) configuration.  The
 |file is called "man.local"; its installation directory depends on how
 |groff was configured when it was built.
 |
 |--- start ---
 |.if '\*[.T]'utf8' \{\
 |.  char ' \[aq]
 |.  char - \-
 |.  char ^ \[ha]
 |.  char ` \[ga]
 |.  char ~ \[ti]
 |.\}

You know, if you would provide a commented-out setting to change
the decade old default behaviour to what you feel is more modern,
or "better", _then_ i could understand it.
I mean i produce backward incompatible changes myself all the
time, but i give plenty of hints.  For example

  $ 

Re: 1.23: UTF-8 device: more display oddities

2022-09-16 Thread G. Branden Robinson
At 2022-09-17T01:00:26+0200, Steffen Nurpmeso wrote:
> G. Branden Robinson wrote in
>  <20220916223236.lmkf3brdwotdn2fd@illithid>:
>  |At 2022-09-16T23:56:58+0200, Steffen Nurpmeso wrote:
>  |> How is anyone supposed to document a sh(1)ell-style manual with
>  |> mdoc(7) (i do not know about man(7)) with these settings?
>  |
>  |By reading the manual, Steffen.
> 
> Ok, and you put a lot of effort in it in the last years.

I'd feel more appreciated if I saw more evidence of you reading it.

> But the point is: last week it looked _entirely_ different,

You chose last week to upgrade from a nearly eight year-old release.[1]

Did you read groff's NEWS file?

> and the locale has not changed!  The manual has not changed either.

I know for a fact that "the manual" has changed substantially since
groff 1.22.3.  I did a significant amount of work on groff documentation
prior to the 1.22.4 release.

Are you referring to some other manual?

> Just to remind you that the hyphen-minus -> hyphen change was commited
> in March _this_ year.

Yes.  After I spent 2+ years advocating it on this mailing list and, as
a small portion of my work, reviewing groff's own ~60 man pages for
correct glyph usage.

> So it you -- you are changing things backward incompatibly!

No, I am aligning things more closely between typesetters and terminal
devices, to reflect the increasing capabilities of terminal devices on
Unix systems since about the year 2000.

You can restore man pages to the appearance you desire by using the same
character encodings you did when you become accustomed to them: ASCII or
ISO Latin-1.  Yes, even using bleeding edge groff Git HEAD to format
them.

>  |UTF-8 content follows.
>  |
>  |groff_char(7):
>  ...
> 
> Please note again i am doing mdoc(7) here, not mom or ms or my own
> macros.

Using mdoc(7) is no reason not to read groff_char(7).  mdoc(7) is a
groff macro package.  It does not alter the syntax or repertoire of
groff special characters.

>  |There is also the "Portability" section of groff_man(7) [groff
>  |1.22.4] or groff_man_style(7) [groff 1.23].
>  |
>  |   Several special characters are also widely portable.  AT&T
>  |   troff
>  ...
> 
> But there is nothing special.

"Special character" is a piece of *roff terminology.  It is startling to
me that you are not already aware of this.

If you'd take a moment to refrain from your multiple expostulations of
"WOW!!!", catch your breath, and oxygenate your brain sufficiently to
read the groff_char(7) man page, you might learn this.

> Input characters are mapped away differently than before.

See above.

>   ...
>  |   \(ha   Basic Latin circumflex accent (“hat”).  Some output
>  |  devices replace “^” with U+02C6 (modifier letter
>  |  circumflex accent) or similar.
>  ...
>  |   \(ti   Basic Latin tilde.  Some output devices replace “~” with
>  |  U+02DC (small tilde) or similar.
> 
> But why?

Why what?  Why do "some devices replace"...?  That's Ingo's wording, if
I recall correctly, but the reason is that some output devices have
larger glyph repertoires than others.  This observation has been
commonplace to *roff users at least since Typesetter roff was written in
about 1972.

I don't think I'd use the term "replace"; every *roff output device
defines a mapping from characters to glyphs.  In this sense, every
character gets "replaced".  Maybe I'll adjust that wording.

> And furthermore: why -Tutf8 that lives on and with fixed-width
> monospace fonts in practically all cases.

I cannot parse this.  Please try to express yourself in standard
English.

> And why differently than before?

See above.

>  |Or you can just do the brute force thing.  From groff 1.23's
>  |"PROBLEMS" file:
> 
> But this changes manuals written over the last decades to
> something completely different, Branden.

Not correctly written man pages.

> I am coming from 1.22.3.  It looked entirely different last week.

You said this already.

> You cannot expect all those people to rewrite all their manuals

https://www.medicalnewstoday.com/articles/320844

I predict the level of effort for most pages to be minimal (some may not
require revision at all), and speaking as someone who has undertaken a
multi-year project to _rewrite documentation_ for groff specifically, I
am thoroughly persuaded that fixing glyph usage errors in man pages is
among the easiest revisions of documentation that a person can
undertake.  If you find this task too daunting, then I cannot help but
anticipate that much more significant flaws in your documentation will
go unaddressed.

The presence of incorrect glyphs is likely to frustrate copy-and-paste
operations, or look mildly strange, but is not, in most cases, going to
be a significant barrier to people trying to apply man pages because in
every case, ASCII glyphs are _easier to type_.

In any event I suspect most man pages will get fixed, if at all, because
readers will report bugs.  I've met too