RE: Rendering the em dash on the terminal

2024-08-26 Thread Jeff Conrad
> From: groff-bounces+jeff_conrad=msn@gnu.org  bounces+jeff_conrad=msn@gnu.org> On Behalf Of Dave Kemper
> Sent: Saturday, 24 August, 2024 12:33 PM

> The new logic is this:
> 
> .ie '\?\*[.T]\?'\?utf8\?' .char \[em] \[em]\[em]
> .el   .char \[em] --
> 

Aesthetics
==
> The motivation is given in the commit log: making \[em] look "more
> like a true em dash, taking up two character cells."

Dunno if taking up two character cells makes it “look more like a
true em dash”; it may be more aesthetically pleasing than two hyphens.

Dash List
-
There are situations in which I’m not sure what gives the best aesthetics.
For example, with mm’s DL (dash list) macro, I might prefer

 —— First item
 —— Next item

to

 -- First item
 -- Next item

Neither is great; far better might be

 — First item
 — Next item

But there may be no easy way to get there from here.

Clarity
===
> An em dash in any monospace font is hard to distinguish from a
> hyphen and other dash-like glyphs.

Agree.  And I think _clarity must trump aesthetics_.  A single em
dash is not obviously seen as such.  And unlike an en dash
(probably seen as a hyphen by most folks anyway, even in typeset
material, which is why most newspapers seldom use it), the
distinction is important.

Sometimes the distinction is important even with an en dash.  A
reasonable rule is that recognition should fail gracefully.  An
example might be Oakland’s “Anti Police-Terror Project.”
Properly, “anti” is a prefix and needs a hyphen, but it’s more
complicated when it modifies a compound.  Chicago style would use
“Anti–Police Terror Project”; suffice it to say that the failure
here is less than graceful.

Any approach that has an em dash take up two character cells
might lead to confusion in a few instances.

Two-Em Dash
---
A two-em dash is often used to indicate omissions: from the
Chicago Manual of Style (18th ed.), § 6.99,

Admiral N—— and Lady R—— were among the guests

Some folks use a single em dash here, which would look the same
as above.  But actually using two em dashes would give

Admiral N and Lady R were among the guests

which isn’t so good.

Three-Em Dash
-
A three-em dash is commonly used in a bibliography to indicate
the same author(s) as the previous entry, e.g.,

Chaudhuri, Amit. Odysseus Abroad. Alfred A. Knopf, 2015.
———. A Strange and Sublime Address. Minerva, 1992.

Input in the normal manner would give

Chaudhuri, Amit. Odysseus Abroad. Alfred A. Knopf, 2015.
——. A Strange and Sublime Address. Minerva, 1992.

which seems kinda long. But perhaps it’s just me.

I suppose a workaround might be terminal-specific characters like
‘2m’ and ‘3m’.  I long had these as strings, more for ease of
entry than for handling different devices.  In this case, though,
it’s not clear how these characters would be handled so there are
clear distinctions among ‘em’, ‘2m’, and ‘3m’.  And if the
typographical convention of ‘--’ were to prevail for ‘em’, I’m
not sure how it would apply to ‘2m’ and ‘3m’.

Comments

> My first concern is that this motivation is communicated only in the
> commit log, leaving a bit of a head-scratcher to anyone merely reading
> the code.  If this logic is kept, its motive should be commented in
> the code.

This seems reasonable.  Most folks can probably figure this out
after a bit of head scratching, but it would be nice to spare
them the trouble.

Typographic Convention
==
> Two em dashes in a row is part of no typographic convention.

Agree.  But the ‘--’ convention comes from manuscript preparation
in typewriter days; I wonder how many younger users are even
aware of it.

Copy and Paste
==
> This will paste very poorly into any text field that uses a
> proportional font.

How often would someone copy and paste from man(1) output?  And I
think the goodness or badness would depend on the target; if the
target is text, it might look a bit strange because the ‘——’
sequence isn’t common.  If the target is something destined for
output in proportional type, I’m not sure ‘--’ is much better.
The only proper sequence in that case is a single em dash, but as
we all seem to agree, this isn’t great for output to a monospace
terminal.

Full disclosure: I format my man pages as PDF, so I may not be
the best person to comment on the appearance of output to
monospace device.

Searches

> It interferes with greps and other searches: most readers
> seeing two hyphen-like characters in a row in a monospace font
> will conclude that they are in fact two hyphens, the
> longstanding convention, rather than two em dashes.

Would it?  I’d probably never think to search for ‘——’, but I
don’t often search for ‘--’, either, because it’s almost always
context dependent.  Conceivably, I might search for an em dash
that either precedes or follows a specific text, but such a
search would work with ‘——’.

Don’t throw stones

Re: Rendering the em dash on the terminal

2024-08-26 Thread G. Branden Robinson
Hi Jeff,

Good to hear from you!  As the new guy, it's always nice for me when a
veteran groff maven chimes in.

(Veteran groff detractors, not so much. 😅)

[CCing you just in case; if you'd prefer I didn't, please say so.]

At 2024-08-26T16:41:47-0700, Jeff Conrad wrote:
> > From: groff-bounces+jeff_conrad=msn@gnu.org  > bounces+jeff_conrad=msn@gnu.org> On Behalf Of Dave Kemper
> > Sent: Saturday, 24 August, 2024 12:33 PM
> 
> > The new logic is this:
> > 
> > .ie '\?\*[.T]\?'\?utf8\?' .char \[em] \[em]\[em]
> > .el   .char \[em] --
> > 
> 
> Aesthetics
> ==
> > The motivation is given in the commit log: making \[em] look "more
> > like a true em dash, taking up two character cells."
> 
> Dunno if taking up two character cells makes it “look more like a
> true em dash”;

It does on my terminal, xterm using Liberation Sans Mono.

See attachment.

The problem I observed is that an em dash should be close to one em
wide--one em properly considered, that is, as wide as an em quadi, or as
wide as a capital letter is from its top to its baseline.  Ordinary or
"halfwidth" character cell fonts simply don't look like that.

Terminals _have_ developed support for bi-width fonts.  And
there _does exist_ a fullwidth hyphen-minus in Unicode (U+FF0D)...but no
fullwidth em dash.

> it may be more aesthetically pleasing than two hyphens.

That is my view.

> Dash List
> -
> There are situations in which I’m not sure what gives the best
> aesthetics.  For example, with mm’s DL (dash list) macro, I might
> prefer
> 
>  —— First item
>  —— Next item
> 
> to
> 
>  -- First item
>  -- Next item
> 
> Neither is great; far better might be
> 
>  — First item
>  — Next item
> 
> But there may be no easy way to get there from here.

In groff 1.24, if you redefine the `EM` string, you'll get whatever dash
you want there.

commit 6a4e2e5cecc4a7ef24e3bf6bfe839d7fdade24b6
Author: G. Branden Robinson 
Date:   Thu Jul 4 20:01:14 2024 -0500

[mm]: Use `EM` string as `DL` list item mark.

* contrib/mm/m.tmac (DL): Use the `EM` string as the mark instead of an
  em dash special character literal.

* contrib/mm/groff_mm.7.man (Macros) :
  (Strings) :
* NEWS: Document this.

> Clarity
> ===
> > An em dash in any monospace font is hard to distinguish from a
> > hyphen and other dash-like glyphs.
> 
> Agree.  And I think _clarity must trump aesthetics_.  A single em
> dash is not obviously seen as such.

The fonts the LWN editor uses seem to render all dash-like symbols the
same.

https://lwn.net/Articles/948720/

> And unlike an en dash (probably seen as a hyphen by most folks anyway,
> even in typeset material, which is why most newspapers seldom use it),
> the distinction is important.
> 
> Sometimes the distinction is important even with an en dash.  A
> reasonable rule is that recognition should fail gracefully.  An
> example might be Oakland’s “Anti Police-Terror Project.”
> Properly, “anti” is a prefix and needs a hyphen, but it’s more
> complicated when it modifies a compound.  Chicago style would use
> “Anti–Police Terror Project”; suffice it to say that the failure
> here is less than graceful.

Might be time to resurrect data transfers over FTP.

> Any approach that has an em dash take up two character cells
> might lead to confusion in a few instances.

Possibly.  It _is_ a hazard, but a minor one more than offset by the
benefit in clarity.  My opinion.

> Two-Em Dash
> ---
> A two-em dash is often used to indicate omissions: from the
> Chicago Manual of Style (18th ed.), § 6.99,
> 
> Admiral N—— and Lady R—— were among the guests
> 
> Some folks use a single em dash here, which would look the same
> as above.  But actually using two em dashes would give
> 
> Admiral N and Lady R were among the guests
> 
> which isn’t so good.
> 
> Three-Em Dash
> -
> A three-em dash is commonly used in a bibliography to indicate
> the same author(s) as the previous entry, e.g.,
> 
> Chaudhuri, Amit. Odysseus Abroad. Alfred A. Knopf, 2015.
> ———. A Strange and Sublime Address. Minerva, 1992.
> 
> Input in the normal manner would give
> 
> Chaudhuri, Amit. Odysseus Abroad. Alfred A. Knopf, 2015.
> ——. A Strange and Sublime Address. Minerva, 1992.
> 
> which seems kinda long. But perhaps it’s just me.
> 
> I suppose a workaround might be terminal-specific characters like
> ‘2m’ and ‘3m’.  I long had these as strings, more for ease of
> entry than for handling different devices.  In this case, though,
> it’s not clear how these characters would be handled so there are
> clear distinctions among ‘em’, ‘2m’, and ‘3m’.  And if the
> typographical convention of ‘--’ were to prevail for ‘em’, I’m
> not sure how it would apply to ‘2m’ and ‘3m’.

I despair of cutting these knots.  For these relatively persnickety
matters I think I would prefer to trust the document author to define
strings and exercise formatter facilities to achiev

RE: Rendering the em dash on the terminal

2024-08-26 Thread Jeff Conrad
> From: G. Branden Robinson 
> Sent: Monday, 26 August, 2024 5:34 PM

> Good to hear from you!  As the new guy, it's always nice for me when a
> veteran groff maven chimes in.

Veteran, perhaps, because of age, but rusty in recent years ...

> (Veteran groff detractors, not so much. 😅)
> 
> [CCing you just in case; if you'd prefer I didn't, please say so.]
> 
Aesthetics
==
> > Dunno if taking up two character cells makes it “look more like a
> > true em dash”;
> 
> It does on my terminal, xterm using Liberation Sans Mono.
> 
> See attachment.

I get similar results with Consolas on a Windows console.  It
looks more like a real em dash in that it’s wider than one cell
(an en?).  Still dunno whether it really looks more like a real
em dash.  Different is never the same, and monospace fonts are
inherently poor substitutes for the real thing.  There is no
substitute for cubic inches!

> The problem I observed is that an em dash should be close to
> one em wide--one em properly considered, that is, as wide as an
> em quadi, or as wide as a capital letter is from its top to its
> baseline.  Ordinary or "halfwidth" character cell fonts simply
> don't look like that.

If we consider monospace fonts “halfwidth” (or at least half
something), ‘——’ probably does look like a true em dash.  But is
“halfwidth” meaningful outside of CJK?

> > Dash List
> > -
> In groff 1.24, if you redefine the `EM` string, you'll get
> whatever dash you want there.

I was unaware that this hasn’t been the case; I checked the AT&T
mmn and mmt files from years ago, and--sure enough--DL uses ‘em’.
This might offer a way of having a different character for a dash
list than elsewhere, but it would eschew the mm tradition of
always using “\*(EM”, whose purpose was to give ‘\(em’ with troff
and ‘--’ with nroff.  And what do we do if ‘\(em’ is already
changed to be two em dashes?

Clarity
===
> The fonts the LWN editor uses seem to render all dash-like
> symbols the same.
>
> https://lwn.net/Articles/948720/

Certainly not the case with any of my editors, though the
distinctions are slight.

> > reasonable rule is that recognition should fail gracefully.
> > Chicago style would use “Anti–Police Terror Project”; suffice
> > it to say that the failure here is less than graceful.

> Might be time to resurrect data transfers over FTP.

I was thinking more of human than data-transmission failures ...
In typeset,

“Anti–Police Terror Project”

would be easily distinguished from

“Anti-Police Terror Project”

but even then, the average person--who probably wouldn’t know an
en dash if it bit them--would read the two as if they were
identical.  And for many, the same may be true for an em dash.
Don’t get me going ...

> > Any approach that has an em dash take up two character cells
> > might lead to confusion in a few instances.
> 
> Possibly.  It _is_ a hazard, but a minor one more than offset by the
> benefit in clarity.  My opinion.

Could well be.

> > Two-Em Dash
> > ---
> > Three-Em Dash
> > -
> >
> > I suppose a workaround might be terminal-specific characters like
> > ‘2m’ and ‘3m’.  I long had these as strings, more for ease of
> > entry than for handling different devices.  In this case, though,
> > it’s not clear how these characters would be handled so there are
> > clear distinctions among ‘em’, ‘2m’, and ‘3m’.  And if the
> > typographical convention of ‘--’ were to prevail for ‘em’, I’m
> > not sure how it would apply to ‘2m’ and ‘3m’.
> 
> I despair of cutting these knots.  For these relatively persnickety
> matters I think I would prefer to trust the document author to define
> strings and exercise formatter facilities to achieve the precise result
> they desire.

You have more faith than I ...  I fear the same result as when
people decided we no longer needed parity bits, freeing up the G1
area for additional characters: everyone had a different idea of
what should go where.  That iconv(1) exists seems a testament to
pervasive idiocy.

Comments

> > This seems reasonable.  Most folks can probably figure this
> > out after a bit of head scratching, but it would be nice to
> > spare them the trouble.
>
> I certainly can add something here.

I think this would help.  And it might help to mention it
elsewhere for (most) folks who will never look at the code or the
commit.

Copy and Paste
==
> > How often would someone copy and paste from man(1) output?
> 
> I do this frequently.
> 
> https://lists.gnu.org/archive/html/groff/2024-07/msg00062.html

I guess I stand corrected 😊.

> If you have a typesetting device (or file format), use it!

Amen! Kinda why troff (and, with it, Unix) was developed.

> This is the man2html story all over again.  Most people produce
> online man pages by scraping and (crudely) transforming
> grotty(1) output.  That makes me sad.  One of my long-term
> goals in groff development is to get people to stop maintaining
> these scraper-converters by off

RE: Rendering the em dash on the terminal

2024-08-26 Thread Jeff Conrad
> From: G. Branden Robinson 
> Sent: Monday, 26 August, 2024 5:34 PM
> To: groff@gnu.org

Something obvious I overlooked: for a command with long options,
there’s probably something to be said for distinguishing between
‘--’ and a true em dash (‘——’).  Another argument for Branden’s
approach.




RE: Rendering the em dash on the terminal

2024-08-26 Thread Jeff Conrad
> From: Jeff Conrad 
> Sent: Monday, 26 August, 2024 8:39 PM
> To: 'groff@gnu.org' 

> > From: G. Branden Robinson 

Aagh ... from me, not Branden.  One of these days I’ll figure
this out.

Something obvious I overlooked: for a command with long options,
there’s probably something to be said for distinguishing between
‘--’ and a true em dash (‘——’).  Another argument for Branden’s
approach.




RE: Rendering the em dash on the terminal

2024-08-26 Thread Jeff Conrad
> From: Jeff Conrad 
> Sent: Monday, 26 August, 2024 8:39 PM
> To: 'groff@gnu.org' 

> > From: G. Branden Robinson 

Aagh ... from me, not Branden.  One of these days I’ll figure
this out.

Something obvious I overlooked: for a command with long options,
there’s probably something to be said for distinguishing between
‘--’ and a true em dash (‘——’).  Another argument for Branden’s
approach.