RE: Rendering the em dash on the terminal
> From: groff-bounces+jeff_conrad=msn@gnu.org bounces+jeff_conrad=msn@gnu.org> On Behalf Of Dave Kemper > Sent: Saturday, 24 August, 2024 12:33 PM > The new logic is this: > > .ie '\?\*[.T]\?'\?utf8\?' .char \[em] \[em]\[em] > .el .char \[em] -- > Aesthetics == > The motivation is given in the commit log: making \[em] look "more > like a true em dash, taking up two character cells." Dunno if taking up two character cells makes it “look more like a true em dash”; it may be more aesthetically pleasing than two hyphens. Dash List - There are situations in which I’m not sure what gives the best aesthetics. For example, with mm’s DL (dash list) macro, I might prefer —— First item —— Next item to -- First item -- Next item Neither is great; far better might be — First item — Next item But there may be no easy way to get there from here. Clarity === > An em dash in any monospace font is hard to distinguish from a > hyphen and other dash-like glyphs. Agree. And I think _clarity must trump aesthetics_. A single em dash is not obviously seen as such. And unlike an en dash (probably seen as a hyphen by most folks anyway, even in typeset material, which is why most newspapers seldom use it), the distinction is important. Sometimes the distinction is important even with an en dash. A reasonable rule is that recognition should fail gracefully. An example might be Oakland’s “Anti Police-Terror Project.” Properly, “anti” is a prefix and needs a hyphen, but it’s more complicated when it modifies a compound. Chicago style would use “Anti–Police Terror Project”; suffice it to say that the failure here is less than graceful. Any approach that has an em dash take up two character cells might lead to confusion in a few instances. Two-Em Dash --- A two-em dash is often used to indicate omissions: from the Chicago Manual of Style (18th ed.), § 6.99, Admiral N—— and Lady R—— were among the guests Some folks use a single em dash here, which would look the same as above. But actually using two em dashes would give Admiral N and Lady R were among the guests which isn’t so good. Three-Em Dash - A three-em dash is commonly used in a bibliography to indicate the same author(s) as the previous entry, e.g., Chaudhuri, Amit. Odysseus Abroad. Alfred A. Knopf, 2015. ———. A Strange and Sublime Address. Minerva, 1992. Input in the normal manner would give Chaudhuri, Amit. Odysseus Abroad. Alfred A. Knopf, 2015. ——. A Strange and Sublime Address. Minerva, 1992. which seems kinda long. But perhaps it’s just me. I suppose a workaround might be terminal-specific characters like ‘2m’ and ‘3m’. I long had these as strings, more for ease of entry than for handling different devices. In this case, though, it’s not clear how these characters would be handled so there are clear distinctions among ‘em’, ‘2m’, and ‘3m’. And if the typographical convention of ‘--’ were to prevail for ‘em’, I’m not sure how it would apply to ‘2m’ and ‘3m’. Comments > My first concern is that this motivation is communicated only in the > commit log, leaving a bit of a head-scratcher to anyone merely reading > the code. If this logic is kept, its motive should be commented in > the code. This seems reasonable. Most folks can probably figure this out after a bit of head scratching, but it would be nice to spare them the trouble. Typographic Convention == > Two em dashes in a row is part of no typographic convention. Agree. But the ‘--’ convention comes from manuscript preparation in typewriter days; I wonder how many younger users are even aware of it. Copy and Paste == > This will paste very poorly into any text field that uses a > proportional font. How often would someone copy and paste from man(1) output? And I think the goodness or badness would depend on the target; if the target is text, it might look a bit strange because the ‘——’ sequence isn’t common. If the target is something destined for output in proportional type, I’m not sure ‘--’ is much better. The only proper sequence in that case is a single em dash, but as we all seem to agree, this isn’t great for output to a monospace terminal. Full disclosure: I format my man pages as PDF, so I may not be the best person to comment on the appearance of output to monospace device. Searches > It interferes with greps and other searches: most readers > seeing two hyphen-like characters in a row in a monospace font > will conclude that they are in fact two hyphens, the > longstanding convention, rather than two em dashes. Would it? I’d probably never think to search for ‘——’, but I don’t often search for ‘--’, either, because it’s almost always context dependent. Conceivably, I might search for an em dash that either precedes or follows a specific text, but such a search would work with ‘——’. Don’t throw stones
Re: Rendering the em dash on the terminal
Hi Jeff, Good to hear from you! As the new guy, it's always nice for me when a veteran groff maven chimes in. (Veteran groff detractors, not so much. 😅) [CCing you just in case; if you'd prefer I didn't, please say so.] At 2024-08-26T16:41:47-0700, Jeff Conrad wrote: > > From: groff-bounces+jeff_conrad=msn@gnu.org > bounces+jeff_conrad=msn@gnu.org> On Behalf Of Dave Kemper > > Sent: Saturday, 24 August, 2024 12:33 PM > > > The new logic is this: > > > > .ie '\?\*[.T]\?'\?utf8\?' .char \[em] \[em]\[em] > > .el .char \[em] -- > > > > Aesthetics > == > > The motivation is given in the commit log: making \[em] look "more > > like a true em dash, taking up two character cells." > > Dunno if taking up two character cells makes it “look more like a > true em dash”; It does on my terminal, xterm using Liberation Sans Mono. See attachment. The problem I observed is that an em dash should be close to one em wide--one em properly considered, that is, as wide as an em quadi, or as wide as a capital letter is from its top to its baseline. Ordinary or "halfwidth" character cell fonts simply don't look like that. Terminals _have_ developed support for bi-width fonts. And there _does exist_ a fullwidth hyphen-minus in Unicode (U+FF0D)...but no fullwidth em dash. > it may be more aesthetically pleasing than two hyphens. That is my view. > Dash List > - > There are situations in which I’m not sure what gives the best > aesthetics. For example, with mm’s DL (dash list) macro, I might > prefer > > —— First item > —— Next item > > to > > -- First item > -- Next item > > Neither is great; far better might be > > — First item > — Next item > > But there may be no easy way to get there from here. In groff 1.24, if you redefine the `EM` string, you'll get whatever dash you want there. commit 6a4e2e5cecc4a7ef24e3bf6bfe839d7fdade24b6 Author: G. Branden Robinson Date: Thu Jul 4 20:01:14 2024 -0500 [mm]: Use `EM` string as `DL` list item mark. * contrib/mm/m.tmac (DL): Use the `EM` string as the mark instead of an em dash special character literal. * contrib/mm/groff_mm.7.man (Macros) : (Strings) : * NEWS: Document this. > Clarity > === > > An em dash in any monospace font is hard to distinguish from a > > hyphen and other dash-like glyphs. > > Agree. And I think _clarity must trump aesthetics_. A single em > dash is not obviously seen as such. The fonts the LWN editor uses seem to render all dash-like symbols the same. https://lwn.net/Articles/948720/ > And unlike an en dash (probably seen as a hyphen by most folks anyway, > even in typeset material, which is why most newspapers seldom use it), > the distinction is important. > > Sometimes the distinction is important even with an en dash. A > reasonable rule is that recognition should fail gracefully. An > example might be Oakland’s “Anti Police-Terror Project.” > Properly, “anti” is a prefix and needs a hyphen, but it’s more > complicated when it modifies a compound. Chicago style would use > “Anti–Police Terror Project”; suffice it to say that the failure > here is less than graceful. Might be time to resurrect data transfers over FTP. > Any approach that has an em dash take up two character cells > might lead to confusion in a few instances. Possibly. It _is_ a hazard, but a minor one more than offset by the benefit in clarity. My opinion. > Two-Em Dash > --- > A two-em dash is often used to indicate omissions: from the > Chicago Manual of Style (18th ed.), § 6.99, > > Admiral N—— and Lady R—— were among the guests > > Some folks use a single em dash here, which would look the same > as above. But actually using two em dashes would give > > Admiral N and Lady R were among the guests > > which isn’t so good. > > Three-Em Dash > - > A three-em dash is commonly used in a bibliography to indicate > the same author(s) as the previous entry, e.g., > > Chaudhuri, Amit. Odysseus Abroad. Alfred A. Knopf, 2015. > ———. A Strange and Sublime Address. Minerva, 1992. > > Input in the normal manner would give > > Chaudhuri, Amit. Odysseus Abroad. Alfred A. Knopf, 2015. > ——. A Strange and Sublime Address. Minerva, 1992. > > which seems kinda long. But perhaps it’s just me. > > I suppose a workaround might be terminal-specific characters like > ‘2m’ and ‘3m’. I long had these as strings, more for ease of > entry than for handling different devices. In this case, though, > it’s not clear how these characters would be handled so there are > clear distinctions among ‘em’, ‘2m’, and ‘3m’. And if the > typographical convention of ‘--’ were to prevail for ‘em’, I’m > not sure how it would apply to ‘2m’ and ‘3m’. I despair of cutting these knots. For these relatively persnickety matters I think I would prefer to trust the document author to define strings and exercise formatter facilities to achiev
RE: Rendering the em dash on the terminal
> From: G. Branden Robinson > Sent: Monday, 26 August, 2024 5:34 PM > Good to hear from you! As the new guy, it's always nice for me when a > veteran groff maven chimes in. Veteran, perhaps, because of age, but rusty in recent years ... > (Veteran groff detractors, not so much. 😅) > > [CCing you just in case; if you'd prefer I didn't, please say so.] > Aesthetics == > > Dunno if taking up two character cells makes it “look more like a > > true em dash”; > > It does on my terminal, xterm using Liberation Sans Mono. > > See attachment. I get similar results with Consolas on a Windows console. It looks more like a real em dash in that it’s wider than one cell (an en?). Still dunno whether it really looks more like a real em dash. Different is never the same, and monospace fonts are inherently poor substitutes for the real thing. There is no substitute for cubic inches! > The problem I observed is that an em dash should be close to > one em wide--one em properly considered, that is, as wide as an > em quadi, or as wide as a capital letter is from its top to its > baseline. Ordinary or "halfwidth" character cell fonts simply > don't look like that. If we consider monospace fonts “halfwidth” (or at least half something), ‘——’ probably does look like a true em dash. But is “halfwidth” meaningful outside of CJK? > > Dash List > > - > In groff 1.24, if you redefine the `EM` string, you'll get > whatever dash you want there. I was unaware that this hasn’t been the case; I checked the AT&T mmn and mmt files from years ago, and--sure enough--DL uses ‘em’. This might offer a way of having a different character for a dash list than elsewhere, but it would eschew the mm tradition of always using “\*(EM”, whose purpose was to give ‘\(em’ with troff and ‘--’ with nroff. And what do we do if ‘\(em’ is already changed to be two em dashes? Clarity === > The fonts the LWN editor uses seem to render all dash-like > symbols the same. > > https://lwn.net/Articles/948720/ Certainly not the case with any of my editors, though the distinctions are slight. > > reasonable rule is that recognition should fail gracefully. > > Chicago style would use “Anti–Police Terror Project”; suffice > > it to say that the failure here is less than graceful. > Might be time to resurrect data transfers over FTP. I was thinking more of human than data-transmission failures ... In typeset, “Anti–Police Terror Project” would be easily distinguished from “Anti-Police Terror Project” but even then, the average person--who probably wouldn’t know an en dash if it bit them--would read the two as if they were identical. And for many, the same may be true for an em dash. Don’t get me going ... > > Any approach that has an em dash take up two character cells > > might lead to confusion in a few instances. > > Possibly. It _is_ a hazard, but a minor one more than offset by the > benefit in clarity. My opinion. Could well be. > > Two-Em Dash > > --- > > Three-Em Dash > > - > > > > I suppose a workaround might be terminal-specific characters like > > ‘2m’ and ‘3m’. I long had these as strings, more for ease of > > entry than for handling different devices. In this case, though, > > it’s not clear how these characters would be handled so there are > > clear distinctions among ‘em’, ‘2m’, and ‘3m’. And if the > > typographical convention of ‘--’ were to prevail for ‘em’, I’m > > not sure how it would apply to ‘2m’ and ‘3m’. > > I despair of cutting these knots. For these relatively persnickety > matters I think I would prefer to trust the document author to define > strings and exercise formatter facilities to achieve the precise result > they desire. You have more faith than I ... I fear the same result as when people decided we no longer needed parity bits, freeing up the G1 area for additional characters: everyone had a different idea of what should go where. That iconv(1) exists seems a testament to pervasive idiocy. Comments > > This seems reasonable. Most folks can probably figure this > > out after a bit of head scratching, but it would be nice to > > spare them the trouble. > > I certainly can add something here. I think this would help. And it might help to mention it elsewhere for (most) folks who will never look at the code or the commit. Copy and Paste == > > How often would someone copy and paste from man(1) output? > > I do this frequently. > > https://lists.gnu.org/archive/html/groff/2024-07/msg00062.html I guess I stand corrected 😊. > If you have a typesetting device (or file format), use it! Amen! Kinda why troff (and, with it, Unix) was developed. > This is the man2html story all over again. Most people produce > online man pages by scraping and (crudely) transforming > grotty(1) output. That makes me sad. One of my long-term > goals in groff development is to get people to stop maintaining > these scraper-converters by off
RE: Rendering the em dash on the terminal
> From: G. Branden Robinson > Sent: Monday, 26 August, 2024 5:34 PM > To: groff@gnu.org Something obvious I overlooked: for a command with long options, there’s probably something to be said for distinguishing between ‘--’ and a true em dash (‘——’). Another argument for Branden’s approach.
RE: Rendering the em dash on the terminal
> From: Jeff Conrad > Sent: Monday, 26 August, 2024 8:39 PM > To: 'groff@gnu.org' > > From: G. Branden Robinson Aagh ... from me, not Branden. One of these days I’ll figure this out. Something obvious I overlooked: for a command with long options, there’s probably something to be said for distinguishing between ‘--’ and a true em dash (‘——’). Another argument for Branden’s approach.
RE: Rendering the em dash on the terminal
> From: Jeff Conrad > Sent: Monday, 26 August, 2024 8:39 PM > To: 'groff@gnu.org' > > From: G. Branden Robinson Aagh ... from me, not Branden. One of these days I’ll figure this out. Something obvious I overlooked: for a command with long options, there’s probably something to be said for distinguishing between ‘--’ and a true em dash (‘——’). Another argument for Branden’s approach.