> From: G. Branden Robinson <g.branden.robin...@gmail.com> > Sent: Monday, 26 August, 2024 5:34 PM
> Good to hear from you! As the new guy, it's always nice for me when a > veteran groff maven chimes in. Veteran, perhaps, because of age, but rusty in recent years ... > (Veteran groff detractors, not so much. 😅) > > [CCing you just in case; if you'd prefer I didn't, please say so.] > Aesthetics ========== > > Dunno if taking up two character cells makes it “look more like a > > true em dash”; > > It does on my terminal, xterm using Liberation Sans Mono. > > See attachment. I get similar results with Consolas on a Windows console. It looks more like a real em dash in that it’s wider than one cell (an en?). Still dunno whether it really looks more like a real em dash. Different is never the same, and monospace fonts are inherently poor substitutes for the real thing. There is no substitute for cubic inches! > The problem I observed is that an em dash should be close to > one em wide--one em properly considered, that is, as wide as an > em quadi, or as wide as a capital letter is from its top to its > baseline. Ordinary or "halfwidth" character cell fonts simply > don't look like that. If we consider monospace fonts “halfwidth” (or at least half something), ‘——’ probably does look like a true em dash. But is “halfwidth” meaningful outside of CJK? > > Dash List > > --------- > In groff 1.24, if you redefine the `EM` string, you'll get > whatever dash you want there. I was unaware that this hasn’t been the case; I checked the AT&T mmn and mmt files from years ago, and--sure enough--DL uses ‘em’. This might offer a way of having a different character for a dash list than elsewhere, but it would eschew the mm tradition of always using “\*(EM”, whose purpose was to give ‘\(em’ with troff and ‘--’ with nroff. And what do we do if ‘\(em’ is already changed to be two em dashes? Clarity ======= > The fonts the LWN editor uses seem to render all dash-like > symbols the same. > > https://lwn.net/Articles/948720/ Certainly not the case with any of my editors, though the distinctions are slight. > > reasonable rule is that recognition should fail gracefully. > > Chicago style would use “Anti–Police Terror Project”; suffice > > it to say that the failure here is less than graceful. > Might be time to resurrect data transfers over FTP. I was thinking more of human than data-transmission failures ... In typeset, “Anti–Police Terror Project” would be easily distinguished from “Anti-Police Terror Project” but even then, the average person--who probably wouldn’t know an en dash if it bit them--would read the two as if they were identical. And for many, the same may be true for an em dash. Don’t get me going ... > > Any approach that has an em dash take up two character cells > > might lead to confusion in a few instances. > > Possibly. It _is_ a hazard, but a minor one more than offset by the > benefit in clarity. My opinion. Could well be. > > Two-Em Dash > > ----------- > > Three-Em Dash > > ------------- > > > > I suppose a workaround might be terminal-specific characters like > > ‘2m’ and ‘3m’. I long had these as strings, more for ease of > > entry than for handling different devices. In this case, though, > > it’s not clear how these characters would be handled so there are > > clear distinctions among ‘em’, ‘2m’, and ‘3m’. And if the > > typographical convention of ‘--’ were to prevail for ‘em’, I’m > > not sure how it would apply to ‘2m’ and ‘3m’. > > I despair of cutting these knots. For these relatively persnickety > matters I think I would prefer to trust the document author to define > strings and exercise formatter facilities to achieve the precise result > they desire. You have more faith than I ... I fear the same result as when people decided we no longer needed parity bits, freeing up the G1 area for additional characters: everyone had a different idea of what should go where. That iconv(1) exists seems a testament to pervasive idiocy. Comments ======== > > This seems reasonable. Most folks can probably figure this > > out after a bit of head scratching, but it would be nice to > > spare them the trouble. > > I certainly can add something here. I think this would help. And it might help to mention it elsewhere for (most) folks who will never look at the code or the commit. Copy and Paste ============== > > How often would someone copy and paste from man(1) output? > > I do this frequently. > > https://lists.gnu.org/archive/html/groff/2024-07/msg00062.html I guess I stand corrected 😊. > If you have a typesetting device (or file format), use it! Amen! Kinda why troff (and, with it, Unix) was developed. > This is the man2html story all over again. Most people produce > online man pages by scraping and (crudely) transforming > grotty(1) output. That makes me sad. One of my long-term > goals in groff development is to get people to stop maintaining > these scraper-converters by offering an alternative that they > struggle _not_ to prefer. man Page Format =============== > > Full disclosure: I format my man pages as PDF, so I may not be > > the best person to comment on the appearance of output to > > monospace device. > > Thank you for exercising this pathway. Deri James and I put a lot of > work into groff 1.23 to make it nice, and further work into the > forthcoming 1.24 to make it even better. Next step: a man command that serves up PDF versions if present in the appropriate places (I have a crude version that does just that for my man pages as well as quite a few others, and additionally, PDF versions of Texinfo documentation. Untold effort has gone into troff, groff, Texinfo, and others so that we can do better than a Teletype 33. Searches ======== > When staring at a Unicode terminal, it's a bad idea to assume > one knows what character is there based on its appearance. Often, yes. > Search this email for 'A'. Now search for 'Α'. But I repeat myself. > > Or do I? In most cases, I think context would suggest the best search. Would I mix Engrish and Greek? Not with my language skills ... > If we're making a bad situation worse, it's by only a small > margin, and the visual clarity in the face of rotten fonts > again, I think, outweighs the argument against. Eroff ===== > I have seen very little on the Internet about eroff, and it > also seems to be lost software with no extant source (or even > binaries?). If you would take some time to jot down > observations about it, that would be helpful to the posterity > of this community. I can probably provide a few tidbits, but I’ll need to rely heavily on memory. Softquad ======== > Even sqtroff seems nearly forgotten in spite of its major role > in getting groff off the ground. I never got around to trying this ($$$), though it was on my wish list, especially after eroff departed the scene. Strange Strings =============== > > .ds EM \%\^\v'-.43m'_\h'-\w'_'u/2u'_\h'-3u*\w'_'u/2u'\h'1m'\h'- > \w'_'u'_\v'.43m'\^ > . . . > apparently eroff would not break a sequence with an > > unclosed vertical motion. > > Interesting. When I get some round tuits I should find out if > GNU troff will, and if it's worth keeping it from doing so. Suffice it to say that my definition of EM was bespoke and born of desperation. Once I was able to create custom soft fonts that included an em dash, this was a nonissue. But as you know, you go to war with the army you have. It’s not the army you might want or wish to have at a later time. Hyphenation Control =================== > > The leading ‘\%’ was added for good measure; I can’t remember proving > > whether it actually helped. > [...] > > `\%` has recently annoyed me with its ambiguity. > > https://lists.gnu.org/archive/html/groff/2024-03/msg00208.html What this seems to say is that ‘\%’ in the middle of a word appears to have priority. With sensible coding like \%antidisestablishmentarianism .br \&\%antidisestablishmentarianism things seem to work as expected. > https://lists.gnu.org/archive/html/groff/2024-04/msg00000.html This seems to suggest that ‘\%’ in the middle of a word gives the first hyphenation point but does not preclude others. I’m not sure there’s anything wrong with this, but without mention in the documentation, I agree it’s ambiguous. Convention, Again ================= > > So ultimately, I dunno. For the most common usages, ‘——’ may be > > aesthetically preferable to ‘--’. But in some less common > > situations, this may confuse more than enhance. I think it’s > > worth hearing what others think. > > For man pages, the mapping can be altered (or removed) in the > "man.local" and "mdoc.local" files. > > More generally, it could be dealt with in the "troffrc-end" file. Lots ways to do this for the sophisticated user. But I think the needs of the average user should take priority. In closing, it should be noted that the CP1252 device (of which I probably have the only instance on the planet) also has an em dash. I render it as such, and agree that it’s not great--so I guess I need to decide whether to go with ‘--’ or ‘——’. And whether to do it in the device or in tty.tmac. Jeff