Re: [Groff] : ASCII Minus Sign in man Pages
Hi Doug, > Originally \(pl and \(mi came from a fixed font (S) while + and \- > came from the current font. That matches CSTR 54 which has \-Minus sign in the current font in the table near the beginning and \(pl and \(mi as `Special Character Names' on the last page. > As I understand your comment, groff has reversed this troff > convention. Additionally groff interprets - as a compromise > HYPHEN-MINUS. It's not that simple. :-) Here's groff 1.22.3-7 with a UTF-8 terminal. $ nroff <<<'- \- \(mi + \(pl' | tr -d ' \n' | recode ..dump UCS2 Mne Description 2010 -1hyphen 2212 -2minus sign 2212 -2minus sign 002B + plus sign 002B + plus sign $ $ troff -Tutf8 <<<'- \- \(mi + \(pl' | egrep 'font|^[Ct]' x font 1 R Chy C\- Cmi t+ Cpl $ nroff never gave U+002D, an ASCII minus. This is a problem for a man page wanting text with an ASCII minus that can be cut and pasted to sh. Because Unicode has reached the TTY, and PDF can be viewed as pixels, we've migrated away from the many-usage ASCII minus to the other specific, more typographic, runes. The PostScript from groff is /F0 10 /Times-Roman@0 SF 2.5<2dad>72 12 SPostScript names: 2d=hyphen ad=softhyphen /F1 10 /Symbol SF (-) A F0 (+) 2.5 E F1 (+) 2.5 E 0 Cg EP So troff's characters map onto these PostScript fonts and characters. - Times-Roman hyphen \-Times-Roman softhyphen \(mi Symbol hyphen + Times-Roman plus \(pl Symbol plus All five look distinct here in gv(1) and match your description; \(mi and \(pl are the Symbol font, \- and + are the current font. As a solution, Ingo made the suggestion to switch \- to always be ASCII minus because we thought \(mi was another name for \- and so still available for the original use of a mathematical minus sign. It's spelt out half-way through https://lists.gnu.org/archive/html/groff/2017-04/msg00052.html starting "That leads to a natural suggestion solving *both* of these problems". But you've thrown a spanner by pointing out \- and \(mi are not equivalent. :-) Ingo, I think this means we need to pause on the switching of \- to always be ASCII minus. > man groff_char, however, tells the original state of affairs. What is > one to believe? groff_char(7) is correct. Output Input PostScript Unicode Notes + \[pl] plusu002Bplus in special font − \[mi] minus u2212minus in special font -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy
Re: [Groff] Macro "itc" is needed to make escape "\c" useful
Hi, Ingo wrote: > Given that the man(7) .TP .itc hack got committed to groff ... > Of course, i still don't recommend actually using it, because that > would make your manual page misrender on groff <= 1.22.3, on mandoc <= > 1.14.1, and on any version of anything else. This is sad news. It's an insufficient improvement due to one man's dislike of inline \f. It shouldn't be used. Hopefully groff's documentation points out in all cases that it's incompatible? -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy
Re: [Groff] ASCII Minus Sign in man Pages
At 2017-05-02T21:29:39-0400, Doug McIlroy wrote: > I was previously told that \(mi is the true minus sign. But the > true minus sign, at least in my mind, must come from the current > font, so that it comes out right wherever it occurs, even in a > bold headline like "Fairbanks shivers at -50". I agree. > I'll buy Branden's first assertion, but if + and \- come from the > current font as they originally did, and \(pl and \(mi come > from the the current font per the previous paragraph, they > become redundant. It wouldn't be the first redundancy in the character escapes: From groff(7): \´ The acute accent ´; same as \(aa. Unescaped: apostrophe, right quotation mark, single quote (ASCII 0x27). \` The grave accent `; same as \(ga. Unescaped: left quote, backquote (ASCII 0x60). [...] \_ The same as \(ul, the underline character. I want to remove some of that overload encouragement in the descriptions of the unescaped results above, because we have (long had) \[cq] and \[oq] for single quotation marks, but that's another discussion. > So I remain confused. I think it's a confusing issue. We didn't have Unicode back in the days of CSTR #54, so the idea that you could get a pile of mathematically-oriented glyphs out of the same font that you had loaded to print your running prose was unheard of. A quick experiment with -Z shows me that groff does still today load the S [special] font when the \(pl and \(mi character escapes are used. On my UTF-8 device, of course, this is a no-op. It's not a no-op on a PostScript device, but I note no _visual_ difference. /F0 10/Times-Roman@0 SF 196.51(foo\(1\) quux foo\(1\))72 48 R(plus +)108 84 Q(mathplus)108 100.8 Q/F1 10/Symbol SF(+)2.5 E F0(minus \255)108 117.6 Q(mathminus)108 134.4 Q F1(-)2.5 E F0 211.235(baz bar)72 768 R(1) 222.615 E 0 Cg EP On what devices do we expect a visual difference? Regards, Branden signature.asc Description: PGP signature
Re: [Groff] ASCII Minus Sign in man Pages
Hi Branden, > A quick experiment with -Z shows me that groff does still today load > the S [special] font when the \(pl and \(mi character escapes are > used. Yes, my list email from earlier today lists the PostScript glyphs: https://lists.gnu.org/archive/html/groff/2017-05/msg00028.html > It's not a no-op on a PostScript device, but I note no _visual_ > difference. That's odd. It's a noticeable difference here, even sticking with the default Roman. Also, + and \(pl vary similarly. $ cat minus.tr .ds b \(rs .ds m \N'45' \*m -. \*bN'45' \N'45'. \*b\*m \-. \*b(mi \(mi. + +. \*b(pl \(pl. $ groff minus.tr >minus.ps $ gs -q -r600 -sDEVICE=pnmraw -sOutputFile=- \ > -dTextAlphaBits=4 - pnmcrop -quiet | > pnmmargin -white 10 | > pnmtopng -quiet -compression 9 >minus.png $ Visible at https://s29.postimg.org/ddwg1okz9/minus.png The PostScript is using Symbol for \(mi and \(pl. /F0 10/Times-Roman@0 SF 2.5 (--)72 12 S 2.5(.\\)-2.5 G (N'45' -. \\- \255. \\\(mi)-2.5 E /F1 10/Symbol SF (-)2.5 E F0 2.5(.++)C 2.5(.\\)-2.5 G (\(pl)-2.5 E F1 (+)2.5 E F0 (.)A 0 -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy
Re: [Groff] ASCII Minus Sign in man Pages
Folk, I've been sort of watching from the sidelines here, but am going to toss in my 2 cents. First, I once heard troff/groff described as the assembly language of type setting. So to my mind it should be "simple" (as in not too complicated) and stable. The first goal is forever lost. Stable, to me, implies not changing much over time, and most changes should be backward compatible. troff/groff has by and large met that test. Having mastered troff at one time the stability has saved me. But my mastery has degraded as I have not kept up with all the improvements and never was a grand master. Backward compatible means that all code written to the existing definitions should turn out the same results as in the past when submitted to new assemblers. (I have nroff documents and C code from the 1970s that still work.) Thus when we have pieces of documented definitions that contradict each other the problem becomes which definition to change. The definitions for - \- \(mi \(hy \(em \(en (others?) should be clear and the implementations should implement them as defined. To my mind - in groff should always default to the ASCII, 7-bit, undistinguished character. When we have assemblers that contradict because of the documentation being inconsistent, what do we do about that? For me, I want the assembler I use, groff, to match the corrected documentation. If different assemblers knowingly disagree with each other it would be a courtesy to the community to document that fact. (Witness the documentation for many of the Linux/Unix/BSD implementations of "the shell".) So if the current definitions for - \- \(hy disagree with historical documents and implementations, they should be documented. If I am writing at the assembly level, I can always .char - \- Given those opinions, I feel it is for the macro packages, the "compilers", to implement the necessary features such as associating true minus-signs with numbers and true hyphens with word separators. And if -x is meant to be keyboard (7-bit ASCII) characters, the compiler should make that so. The unfortunate history is that the man pages and other ancient documents come from a time when the users of macros where expected to dive into the assembly language _frequently_ to get-around the things that the macros just did not address. And that history is still with us in WYSIWYG (What You See Is What You Get) word processors. Want that - to be a minus in WYSIWYG? Dive into the font table and pick out the character there, if you can find it. My impression is that some macros, such as Schaffter's Mom, go a long way towards eliminating the assembly get-arounds. Still macros take a programmers view of documentation, namely to compile our document source code rather than format the WYSIWYG input. Their advantage is that simple "commands" crank out a lot of assembler code. Calling something a TITLE implies a lot of specifics. All that said, the concept of having the complier decide whether a character should be a minus, hyphen, minus-hyphen, UTF8-something-or-other, etc. should be in the realm of a higher level component than troff/groff. And the fix for old documents, such as the man pages that depend on groff for their appearance, is to edit their source code so their specifics match the (corrected?) groff definitions. Mike On Tue, May 02, 2017 at 09:29:39PM -0400, Doug McIlroy wrote: > > Branden wrote > > Ingo's proposal would not mandate that + and \- come from the special > font. > > It also would not mandate that \(pl and \(mi come from the current font. > > > -- > > I was previously told that \(mi is the true minus sign. But the > true minus sign, at least in my mind, must come from the current > font, so that it comes out right wherever it occurs, even in a > bold headline like "Fairbanks shivers at -50". > > > I'll buy Branden's first assertion, but if + and \- come from the > current font as they originally did, and \(pl and \(mi come > from the the current font per the previous paragraph, they > become redundant. > > So I remain confused. > > Doug -- Mike Bianchi Foveal Systems 973 822-2085 mbian...@foveal.com http://www.AutoAuditorium.com http://www.FovealMounts.com
Re: [Groff] ASCII Minus Sign in man Pages
Hi Mike, > Stable, to me, implies not changing much over time, and most changes > should be backward compatible. ... > Backward compatible means that all code written to the existing > definitions should turn out the same results as in the past when > submitted to new assemblers. (I have nroff documents and C code from > the 1970s that still work.) Agreed. > Thus when we have pieces of documented definitions that contradict > each other the problem becomes which definition to change. The > definitions for > > - \- \(mi \(hy \(em \(en (others?) \N'45' ? - A hyphen for text, e.g. beer-flavoured ice-cream. \- A minus sign in the current font. \(miA minus sign in the special font. \(hyAnother name for plain `-', so a hyphen for text. \N'45' Glyph 45 in the current font. > To my mind - in groff should always default to the ASCII, 7-bit, > undistinguished character. But it's always meant hyphen in pre-groff troff because it's a lot more common to want a hyphen in writing than a minus sign. Then Unicode decided ASCII minus had too many meanings and couldn't be used for any of them so created U+2010 for hyphen, and U+2212 for minus sign, and groff switched to producing those for the hyphen and minus sign, leaving ASCII minus unreproducible apart from \N'45'. man pages that were and are considered correctly written have used \- for a command-line minus sign, e.g. `wc \-l'. (Incorrect man pages that wrote `wc -l' can be ignored for the discussion; there seems to be the will to fix them.) For that to paste from a man page, viewed as UTF-8 TTY, PostScript, PDF, browser, ..., it needs to be character 45. Writing «wc \N'45'l» isn't going to gain support. :-) How to produce it is the issue. -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy
Re: [Groff] ASCII Minus Sign in man Pages
Mike Bianchi wrote: I absolutely and completely support this opinion of yours. Maybe except that it would be nice if some future user could easily find some documentation and now what to do to get "nice"r looking output, maybe with some command line argument (variable), or a configuration file, because all those misuses came in for a reason, and that cannot simply be neglected, after all. --steffen |Ralph says i must not use signatures which spread the light!
Re: [Groff] Critique this bold-italic private macro for man pages
> "G. Branden Robinson" hat am 3. Mai 2017 um > 01:02 geschrieben: > > The .itc request is a groff extension so an additional layer of > > .ie \(.g > > could be added. Where do you want to add this--in the macro package? This would not be necessary, since it is already groff's own package. We don't need to care for those who steal it ;) I saw testing for \(.g in manpages--this is a bad idea. Manpages itself need to be written portable, without testing for this or that formatter. Carsten
Re: [Groff] Critique this bold-italic private macro for man pages
At 2017-05-03T17:24:41+0200, Carsten Kunze wrote: > > "G. Branden Robinson" hat am 3. Mai 2017 um > > 01:02 geschrieben: > > > > The .itc request is a groff extension so an additional layer of > > > > .ie \(.g > > > > could be added. > > Where do you want to add this--in the macro package? Nope. By "private macro" I mean one defined and used only within one document. > I saw testing for \(.g in manpages--this is a bad idea. Manpages > itself need to be written portable, without testing for this or that > formatter. Most \n(.g tests I've seen in man pages are to try to _achieve_ portability, not break it. E.g., ncurses uses these conditionals in many of its pages: .ie \n(.g .ds `` \(lq .el .ds `` `` .ie \n(.g .ds '' \(rq .el .ds '' '' Regards, Branden signature.asc Description: PGP signature
Re: [Groff] ASCII Minus Sign in man Pages
> > For that to paste from a man page, viewed as UTF-8 TTY, Erm, I may be missing something, here... but if monospaced hyphens and minus signs are optically indistinguishable, what's the worth in differentiating between either? IMHO, if any change is to be made, it should be with grotty's handling of \-. A new escape sequence (or command-line switch) could always be added for authors/users who wish for a \- to *always* be rendered as U+2212, even for Unicode-enabled terminals. Possible example might be character listings (e.g., groff_char(7), or for Unicode-related documentation). Of course, that still wouldn't do anything for code-blocks in PDFs. Then again, I wouldn't be copying code from a PDF without expecting to clean it up after pasting, anyway... (I'm sure this suggestion is sounding silly to somebody...) On 4 May 2017 at 00:51, Ralph Corderoy wrote: > Hi Mike, > > > Stable, to me, implies not changing much over time, and most changes > > should be backward compatible. > ... > > Backward compatible means that all code written to the existing > > definitions should turn out the same results as in the past when > > submitted to new assemblers. (I have nroff documents and C code from > > the 1970s that still work.) > > Agreed. > > > Thus when we have pieces of documented definitions that contradict > > each other the problem becomes which definition to change. The > > definitions for > > > > - \- \(mi \(hy \(em \(en (others?) > > \N'45' ? > > - A hyphen for text, e.g. beer-flavoured ice-cream. > \- A minus sign in the current font. > \(miA minus sign in the special font. > \(hyAnother name for plain `-', so a hyphen for text. > \N'45' Glyph 45 in the current font. > > > To my mind - in groff should always default to the ASCII, 7-bit, > > undistinguished character. > > But it's always meant hyphen in pre-groff troff because it's a lot more > common to want a hyphen in writing than a minus sign. Then Unicode > decided ASCII minus had too many meanings and couldn't be used for any > of them so created U+2010 for hyphen, and U+2212 for minus sign, and > groff switched to producing those for the hyphen and minus sign, leaving > ASCII minus unreproducible apart from \N'45'. > > man pages that were and are considered correctly written have used \- > for a command-line minus sign, e.g. `wc \-l'. (Incorrect man pages that > wrote `wc -l' can be ignored for the discussion; there seems to be the > will to fix them.) For that to paste from a man page, viewed as UTF-8 > TTY, PostScript, PDF, browser, ..., it needs to be character 45. > Writing «wc \N'45'l» isn't going to gain support. :-) How to produce > it is the issue. > > -- > Cheers, Ralph. > https://plus.google.com/+RalphCorderoy > >
Re: [Groff] ASCII Minus Sign in man Pages
On Wed, May 03, 2017 at 03:51:24PM +0100, Ralph Corderoy wrote: > : > : > - A hyphen for text, e.g. beer-flavoured ice-cream. > : > > To my mind - in groff should always default to the ASCII, 7-bit, > > undistinguished character. > > But it's always meant hyphen in pre-groff troff because it's a lot more > common to want a hyphen in writing than a minus sign. _I_ would claim this interpretation was a mistake. ((My opinion only here.)) The - character exists on all keyboards. It is not labeled minus or hyphen or endash. It generates the decimal 45 (hex 0x2D, octal 055) character. That any *roff processor would give it a different meaning is most unfortunate. Especially because hyphenation is a built in feature of *roff and once there was the concept of \(hy , hyphenated words should have used it. Note please that I am not saying that - should be interpreted as \(mi either. > : > \- A minus sign in the current font. > \(miA minus sign in the special font. I would claim that \- makes sense, but \(mi coming from the special font is a hold-over from the _first_ troff at Bell Labs that was tailored to the first support photo typesetter that supported 4 102-character fonts. They were Roman, Bold, Italic, and Special (R B I S). Special was the Greek alphabet and other needed characters. Us old-timers fondly remember the Bell System bell. See https://en.wikipedia.org/wiki/Troff "CAT phototypesetter" https://en.wikipedia.org/wiki/CAT_%28phototypesetter%29 ((And interestingly, the current S (Symbol) font also contains the numbers, presumably so the they and the arithmetic and logic operators could all look alike in mathematical writing. I'm guessing that *roff does not take the digits from the Symbol font by default. I think that as an effective argument for not making \(mi draw from the S font by default.)) > \- A minus sign in the current font. > \(miA minus sign in the special font. > \(hyAnother name for plain `-', so a hyphen for text. > \N'45' Glyph 45 in the current font. Once fonts distinguished between minus and hyphen with distinctive glyphs then \(mi and \(hy have should come from the current font, especially if neither is \N'45' . BUT that is MY opinion. What I am pushing for is that all the groff documentation speak truth on this matter. > ... paste from a man page, viewed as UTF-8 > TTY, PostScript, PDF, browser, ..., it needs to be character 45. > Writing «wc \N'45'l» isn't going to gain support. :-) > How to produce it is the issue. Absolutely. I propose wc -l if - was \N'45' It would make sense for future generations. As a first generation UNIX citizen it is interesting to contemplate how much longer the man pages groff documents will be relevant. -- Mike Bianchi Foveal Systems 973 822-2085 mbian...@foveal.com http://www.AutoAuditorium.com http://www.FovealMounts.com
Re: [Groff] Critique this bold-italic private macro for man pages
> "G. Branden Robinson" hat am 3. Mai 2017 um > 17:30 geschrieben: > > Nope. By "private macro" I mean one defined and used only within one > document. A manpage is "one document". Or what do you refer to? > Most \n(.g tests I've seen in man pages are to try to _achieve_ > portability, not break it. > > E.g., ncurses uses these conditionals in many of its pages: > > .ie \n(.g .ds `` \(lq > .el .ds `` `` > .ie \n(.g .ds '' \(rq > .el .ds '' '' So all formatters except groff are only allowed to have second class output? Exactly because of this poor code Heirloom sets .g to 1. I assume also mandoc(1) reads \(.g as 1. Had this been the intention behind register .g in groff? So when all relevant tools set .g to 1, what is the point of this register? ;) Three tools know of these special characters. Either change other tools to support groff_char(7), groff_man(7) and groff_mdoc(7) or replace them with with one of the three tools.
Re: [Groff] Critique this bold-italic private macro for man pages
> Carsten Kunze hat am 3. Mai 2017 um 21:37 > geschrieben: > > > E.g., ncurses uses these conditionals in many of its pages: > > > > .ie \n(.g .ds `` \(lq > > .el .ds `` `` > > .ie \n(.g .ds '' \(rq > > .el .ds '' '' I overlooked the word ncurses... Ok, there had been days when this distinction did made sense. But it should not be used in new code anymore... Carsten
Re: [Groff] Critique this bold-italic private macro for man pages
Is there literally no way to identify when a modern (non-GNU) troff is being used? On 4 May 2017 at 05:46, Carsten Kunze wrote: > > Carsten Kunze hat am 3. Mai 2017 um 21:37 > geschrieben: > > > > > E.g., ncurses uses these conditionals in many of its pages: > > > > > > .ie \n(.g .ds `` \(lq > > > .el .ds `` `` > > > .ie \n(.g .ds '' \(rq > > > .el .ds '' '' > > I overlooked the word ncurses... > > Ok, there had been days when this distinction did made sense. But it > should not be used in new code anymore... > > Carsten > >
Re: [Groff] Critique this bold-italic private macro for man pages
> John Gardner hat am 3. Mai 2017 um 21:55 geschrieben: > > > Is there literally no way to identify when a modern (non-GNU) troff is > being used? General typesetting is something else. Heirloom has this kludge only for manpages, neatroff (AFAIK not used for manpages) likely does not set .g. There are ways to detect the formatter but a manpage must not do this. IMHO a manpage should suppose that groff is used. If groff has bugs (e.g. compared to mandoc(1)) they should be fixed. Carsten
Re: [Groff] Critique this bold-italic private macro for man pages
> "G. Branden Robinson" hat am 3. Mai 2017 um > 22:47 geschrieben: > > So ncurses should be gating on the definition of the glyph rather than > on whether groff is the typesetter, right? > > .ie c \(lq .ds `` \(lq > .el.ds `` `` > .ie c \(rq .ds '' \(rq > .el.ds '' '' > > What do you think? Short answer: We should differ manpages and other typesetting. For general typesetting a document is created for one special tool. If not, if it is e.g. a macro package like -mom, it need to detect the tool to make use of the tools special features. There are severe differences (not in general but regarding special features) between groff, Heirloom and neatroff. If you e.g. intent to write a book, it may be better to choose on of these tools and then write a document using all powerful features, which consequently is not portable--but this is ok. Manpages should use the -man or -mdoc macros itself with as few as possible low level requests or escapes. -mdoc should not need low level *roff elements at all. As Ingo said, there are three major tools. Most system IMHO use groff, more and more are using mandoc(1). If there is an ancient system with an old *roff tool, this tool should be replaced. To answer your question: Simply use \(lq etc. (in manpages) and assume the tool supports it. Carsten
Re: [Groff] Critique this bold-italic private macro for man pages
Hi, Carsten Kunze Heirloom wrote on Wed, May 03, 2017 at 09:37:21PM +0200: > I assume also mandoc(1) reads \(.g as 1. Yes: $ echo '\\n(.g' | mandoc | sed -n 5p 1 $ less /co/mdocml/roff.c int roff_getreg(const struct roff *r, const char *name) { int val; if ('.' == name[0] && '\0' != name[1] && '\0' == name[2]) { val = roff_getregro(r, name + 1); if (-1 != val) return val; } /* ... handle read-write registers ... */ } /* * Handle some predefined read-only number registers. * ... */ static int roff_getregro(const struct roff *r, const char *name) { switch (*name) { case 'g': /* Groff compatibility mode is always on. */ return 1; /* ... handle some other read-only registers ... */ } } It's a pervasive pattern: When writing portable software (or in this case, portable documentation), and when trying to figure out whether the platform we are currently trying to run on supports a given feature we wish to use, *never* test for software or operating system names or for version numbers. That plainly doesn't work in practice: If you test for names, the functionality of those programs or systems will change over time, and if you test for version numbers, you will run into implementations you never considered (and hence don't know any version numbers for) but that still support the feature, or you run into implementations that do have version numbers, but where they mean something completely unexpected. For software, write feature tests instead and configure your software accordingly for compilation. For manual pages, do not write any test code at all. If you put your feature tests into the manual pages themselves, they are not only exceedingly ugly, but often a bigger portability problem than the features you are trying to test for in the first place. If you keep the feature tests separate and try to automatically edit the manual page source code accordingly before installing the manual pages, that usually turns into a maintenance nightmare. Manual pages just don't need the compilation step that program code needs, and inventing one just for autoconfiguration purposes is seriously going over the top. If writing man(7), stick to the lowest common denominator features that are supported by practically everything. If writing mdoc(7), write it for the modern standard as documented in groff_mdoc(7) and mandoc mdoc(7). Heirloom copes with that as well, and legacy implementations are virtually inexistent. Yours, Ingo
Re: [Groff] Critique this bold-italic private macro for man pages
Hi Branden, > .ie c \(lq .ds `` \(lq > .el.ds `` `` > .ie c \(rq .ds '' \(rq > .el.ds '' '' > > What do you think? If doesn't work: $ uname -a SunOS unstable11s 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise $ cat tmp.roff .ie c \(lq .ds `` \(lq .el.ds `` `` .ie c \(rq .ds '' \(rq .el.ds '' '' >>>\*(``hello world!\*(''<<< $ nroff tmp.roff >>>hello world!<<< $ troff tmp.roff | /usr/lib/lp/postscript/dpost | sed -n '/>>>/,/<<>>)720 120 w 10 R f (hello world!)1 499 1 888 120 t 10 S1 f (<<<)1387 120 w $ nroff >>> .ie c \(mi defined .el undefined <<< ^D >>> <<< $ echo '>>>\(lqhello world!\(rq<<<' | nroff >>>hello world!<<< There are real-world systems (sold today) where neither \(lq nor the 'c' conditional is supported. And yes, that kind of nroff may be used for manual page display by default: $ strings `which man` | grep roff /usr/lib/sgml/sgml2roff troff lp -c -T troff nroff -u0 -Tlp Yours, Ingo
Re: [Groff] Critique this bold-italic private macro for man pages
On Wed, 3 May 2017 22:06:10 +0200 (CEST) Carsten Kunze wrote: > There are ways to detect the formatter but a manpage must not do > this. Why not? ISTM we'd have better manpages if they weren't constrained to the rendering capability of a VT-100 terminal. For example, equations or pictures could augment the text, or replace some of it, when "printed". --jkl
Re: [Groff] ASCII Minus Sign in man Pages
On Wed, 3 May 2017 13:42:55 -0400 Mike Bianchi wrote: > The - character exists on all keyboards. It is not labeled minus > or hyphen or endash. It generates the decimal 45 (hex 0x2D, octal > 055) character. That any *roff processor would give it a different > meaning is most unfortunate. IMO that is the principal that should be applied: every unadorned character appearing in troff input should represent itself. If you want something other than that, groff_char(7) describes your options. IIUC, this debate about how to render - and \- stems from a conflict in historical practice. Is the following correct? When troff was young, terminals were ascii and the - character was 0x2d. Manpage guidelines encouraged the use of \- for flags because they rendered nicely in printed documents with no harm done to nroff output. They did that despite the obvious fact that the manpage is there to describe what to type, and basically no one can type the denoted character. Then Unicode pronounced that 0x2d was neither fish nor fowl, and gave us hyphen, minus, and endash characters. groff dutifully mapped - onto hyphen \- onto minus. But when terminals gained Unicode capability, some of them lost cut-and-paste convenience. The debate is over how to recover that convenience. Oddly, my system doesn't exibit any cut-and-paste anomaly despite using xterm with the "-en UTF-8" option. Searching for - in less also works. If it's a UI issue we're confronting, perhaps it's really up to the UI to deal with. The man utility can certainly impose on nroff the requirement that - and \- both render as 0x2d. Then it shows up correctly in the pager. It is visually acceptable to the user, and DTRT regarding the UI. (Maybe that's what Ubuntu LTS does for me; I don't know.) It's not obvious to me groff should make any change at all. At most, reverting the mapping of - so that it outputs 0x2d again would undo a nonobvious, subtle change in favor of simplicity. Possibly some degree of outreach to the UI community would be service, too. --jkl
Re: [Groff] Critique this bold-italic private macro for man pages
At 2017-05-04T01:04:48+0200, Ingo Schwarze wrote: > Hi Branden, > > > .ie c \(lq .ds `` \(lq > > .el.ds `` `` > > .ie c \(rq .ds '' \(rq > > .el.ds '' '' > > > > What do you think? > > If doesn't work: > > $ uname -a > SunOS unstable11s 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise > $ cat tmp.roff > .ie c \(lq .ds `` \(lq > .el.ds `` `` > .ie c \(rq .ds '' \(rq > .el.ds '' '' > >>>\*(``hello world!\*(''<<< > $ nroff tmp.roff > >>>hello world!<<< > $ troff tmp.roff | /usr/lib/lp/postscript/dpost | sed -n '/>>>/,/<< (>>>)720 120 w > 10 R f > (hello world!)1 499 1 888 120 t > 10 S1 f > (<<<)1387 120 w > > $ nroff > >>> > .ie c \(mi defined > .el undefined > <<< > ^D > >>> <<< > $ echo '>>>\(lqhello world!\(rq<<<' | nroff > >>>hello world!<<< > > There are real-world systems (sold today) where neither \(lq > nor the 'c' conditional is supported. And yes, that kind > of nroff may be used for manual page display by default: > > $ strings `which man` | grep roff > /usr/lib/sgml/sgml2roff > troff > lp -c -T troff > nroff -u0 -Tlp N.B. my comments here are situated in a context _outside_ of the GNU Troff project. Why do my man pages need to be more portable the shell scripts or C code I ship with them? What is the value in reading the _formatted_ version of a man page for a tool that won't compile or run correctly on the host? I refuse to write shell scripts for general-purpose consumption only in the historical Bourne dialect that Solaris /bin/sh was, and I refuse to write man pages for general-purpose consumption only in some minimal common subset that _no one_ has troubled themselves to carefully define. As I've said before, I fear the talk of "safe subsets" and portability in groff_man(7) and man(7) are years out of date and in places anticipated a golden age of direct man-to-HTML rendering that never came to pass for several reasons. I want the "man language" to be small, as I also said before, but not cripplingly so, and I want it to be informed by design, not accidental overlaps in Venn diagrams of traits supported by unmaintained implementations. For example, even if that old Solaris system can render the output of docbook-to-man legibly, that wouldn't mean that docbook-to-man's output is well-considered or a model to emulate in any particulars. Finally, Groff's own portability is sufficiently broad that we should be able to say, "as a rule, if you want man pages written in the past 25 years to render nicely on Boozix 11.0, Gizmoware's custom hybrid of 4.3BSD and SVr3, please build and install Groff so that its macro packages can achieve this for you." Regards, Branden signature.asc Description: PGP signature
Re: [Groff] Critique this bold-italic private macro for man pages
At 2017-05-03T20:13:29-0400, James K. Lowden wrote: > On Wed, 3 May 2017 22:06:10 +0200 (CEST) > Carsten Kunze wrote: > > > There are ways to detect the formatter but a manpage must not do > > this. > > Why not? ISTM we'd have better manpages if they weren't constrained > to the rendering capability of a VT-100 terminal. Oh, it can get much worse than that. If we were to tell man page writers target only the common subset of features implemented by everything that's claimed to be VT-100-compatible over the years, we would be left only with that which devascii offers, and even that is probably too sophisticated. > For example, equations or pictures could augment the text, or replace > some of it, when "printed". Yes. For the past several years, xterm has supported DEC's ReGIS[1] and Sixel[2] graphics. Note that when people call something "VT-100-compatible", they generally mean some smorgasbord of features cribbed from the VT100 and its many successor terminals (VT220, VT320, VT420, VT520, VT525), whatever seemed cool and/or was easy to implement. At the same time, the finer details of getting baseline VT100 emulation completely correct with respect to, say, ACS ("alternate character set") and cursor movement were frequently neglected. [1] https://www.youtube.com/watch?v=Dmrmj5y72kg [2] https://upload.wikimedia.org/wikipedia/commons/6/62/W3m-wikipedia.png Regards, Branden signature.asc Description: PGP signature
Re: [Groff] ASCII Minus Sign in man Pages
Hi Ralph, Ralph Corderoy wrote on Wed, May 03, 2017 at 03:51:24PM +0100: > - A hyphen for text, e.g. beer-flavoured ice-cream. > \- A minus sign in the current font. > \(miA minus sign in the special font. > \(hyAnother name for plain `-', so a hyphen for text. > \N'45' Glyph 45 in the current font. The trouble with \N'45' is that it has not a fixed meaning and that the resulting glyph varies wildly. Even if you only look at groff and only at git master HEAD, \N'45' means: -Tascii: U+002D HYPHEN-MINUS -Tlatin1: U+002D HYPHEN-MINUS -Tutf8:U+002D HYPHEN-MINUS -Thtml:U+002D HYPHEN-MINUS -Tcp1047: nothing useful, a control character (ENQ, enquiry character) -Tps: hyphen; that's the same character as \(hy -Tpdf: hyphen; that's the same character as \(hy -Tdvi: hyphen; that's the same character as \(hy -Tlbp: hyphen; that's the same character as \(hy -Tlj4: undefined, no character at all While this is clearly what you want for -Tascii, -Tlatin1, -Tutf8, and -Thtml, and in particular for -Tutf8 where it is a reasonably wide glyph looking like a minus sign and also the right character for copy and paste, it is dubious whether \N'45' is what you want for -Tps, -Tpdf, -Tdvi, and -Tlbp. First, the fact that the character number agrees with ASCII doesn't mean much. While the arrangement of the glyphs for the TR font of the -Tps device is loosely based on ASCII, the codepoints for several characters mismatch. For example, U+0027 APOSTROPHE, which you get with \(aq, is codepoint 8 in -Tps TR, *not* codepoint 39 as you would expect, which is instead \(cq = U+2019 RIGHT SINGLE QUOTATION MARK. U+0060 GRAVE ACCENT is codepoint 146, not 96, which is instead \(oq = U+2018 LEFT SINGLE QUOTATION MARK. U+005E CIRCUMFLEX ACCENT has two associated codepoints, the expected 94 (produced with ^) and the unexpected 0 (\(ha). So has U+007E TILDE, the expected 126 (~) and the unexpected 1 (\(ti). It is also amusing that in the appendix below, the character \(en transforms to seven different glyph numbers in nine different output devices, and only one of them is related to ASCII, which emphasizes the weak relationship between glyph numbers (even for -Tps) and ASCII character numbers. Then, the glyph you get for \N'45' in -Tps TR is *not* similar to the wide glyph that you would expect for U+002D HYPHEN-MINUS. Instead, it is a typical, short hyphen. So, which glyph in -Tps TR does represents U+002D HYPHEN-MINUS? Asking the question that way, the full dilemma becomes obvious: Just like in classical typography, there is *none*. So not only does groff provide no way to request an output glyph for "ASCII -", worse, the fonts for the important -Tps and -Tpdf devices do not even contain such a glyph! Consider a program that wants to copy text out of a groff-generated PostScript or PDF document and paste it into a Unicode terminal window. Such a program could reasonably be expected to translate -Tps TR codepoint 45 into U+2010 HYPHEN because that's what you get from the unambiguous \(hy input character, and it could reasonably be expected to translate -Tps TR codepoint 173 into U+2212 MINUS SIGN because, as Doug kindly reminded us, \- is the minus sign in the current font and \(mi is not in the TR font at all. But now we have already exhausted the glyphs in -Tps TR and there is no glyph left that could be converted into U+002D HYPHEN-MINUS. Many people have said during this discussion that they wouldn't expect copy and paste from a PDF viewer to a UTF-8 terminal to work. The above may well be part of the precise reason why indeed it cannot fully work. On the other hand, i stumbled because nowadays, we have got used to the feeling that Unicode might be a superset of everything. So i considered \- and \(mi redundant because both map to U+2212, and so i hoped one of them might be up for grabs. Yet Doug kindly reminded us what the distinction is, and that distinction indeed cannot be represented in terms of Unicode codepoints. So i have to reluctantly conclude that your original problem, Ralph, of requesting "a glyph representing U+002D HYPHEN-MINUS" is unsolvable. At least not without adding yet another glyph to the -Tps TR font. But even that would hardly help, for two reasons: all PostScript and PDF viewer software would first have to catch up, correctly recognize the glyph, and correctly translate it to U+002D HYPHEN-MINUS - but the ecosystem such software lives in evolves incredibly slowly nowadays. And then you would have to define a new character escape sequence to access the new glyph, since all the existing escapes already mean something else. But that's effectively a reductio ad absurdum: telling all manual page authors to, henceforth, write "wc \(hml" is not going to fly. Very few would understand and follow that, and even i would tend to resist it as excessive complication. So i fear we are left with the traditional workaround:
Re: [Groff] Critique this bold-italic private macro for man pages
Hi Branden, G. Branden Robinson wrote on Wed, May 03, 2017 at 08:52:42PM -0400: > Why do my man pages need to be more portable the shell scripts > or C code I ship with them? They need not, but i would consider aiming for about the same level of portability reasonable. Meaning, that they work on systems that you target. > What is the value in reading the _formatted_ version of a man page > for a tool that won't compile or run correctly on the host? That's indeed not very high priority, but why do you think that any (reasonably portable) software would not run on Solaris 11? I do not want to promote Oracle software, and in fact i use none for my own purposes. But Solaris 11 is a certified UNIX system conforming to the 2016 edition of the Single UNIX Specification, version 4: https://www.opengroup.org/openbrand/register/brand3585.htm That implies that the native compiler conforms to ISO C 99 and that the native /bin/sh is a POSIX shell. > I refuse to write shell scripts for general-purpose consumption > only in the historical Bourne dialect that Solaris /bin/sh was, That doesn't apply to Solaris 11. > and I refuse to write man pages for general-purpose consumption > only in some minimal common subset that _no one_ has troubled > themselves to carefully define. Admittedly, C and sh(1) are well standardized, and man(7) is not. All the same, would you want your manual pages to break on systems where your program compiles and runs just fine? Yes, Solaris typically does need some porting work for code that is mostly developed on BSD or Linux, even though it is a POSIX system. But porting is usually not a big deal. At least not for Solaris 11. Solaris 9, on the other hand, is a bit more of an adventure, but i didn't talk about the nroff you might find there, either. [...] > For example, even if that old Solaris system I often snarkily call any version of Solaris "archaic software" myself. All the same, this particular system is a very new system on sale today. > can render the output of docbook-to-man legibly, It absolutely cannot. On that system, even though it is the newest that you can buy from Oracle today, any kind of docbook-to-man output will fall flat on its face before it even finds the time to be surprised enough to utter a grunt. [...] > Finally, Groff's own portability is sufficiently broad that we should be > able to say, "as a rule, if you want man pages written in the past 25 > years to render nicely on Boozix 11.0, Gizmoware's custom hybrid of > 4.3BSD and SVr3, please build and install Groff so that its macro > packages can achieve this for you." As a matter of fact, groff is already installed on that system: $ groff -v | head -n1 GNU groff version 1.19.2 It is a very new Solaris system and not exactly the newest groff available. So, you want to use groff? Fine, read the manual! $ man groff 2>&1 | cat Reformatting page. Please Wait... done $ echo $? 0 Hum. It looks like that manual page uses features that the native man(1) cannot handle, likely because it uses the native nroff(1). So, to read the groff manual pages, you already need to know how to use groff: $ gnroff -c -mandoc /usr/share/man/man1/groff.1 | less Voila. Now it works. I admit such a system is hard to use, and i wouldn't use it for productive work myself. But completely dismissing it when talking about "portable software" is not gonna cut it. Oh, by the way, if you don't like the native shell: $ bash --version | head -n2 GNU bash, version 4.1.17(1)-release (sparc-sun-solaris2.11) Copyright (C) 2009 Free Software Foundation, Inc. Same pattern as with groff: Not at all up to date, but *something* is there that is maybe almost acceptable for some purposes. Yours, Ingo
Re: [Groff] ASCII Minus Sign in man Pages
Hi James, James K. Lowden wrote on Wed, May 03, 2017 at 08:13:18PM -0400: > IIUC, this debate about how to render - and \- stems from a conflict in > historical practice. Is the following correct? > > When troff was young, terminals were ascii and the - character > was 0x2d. Manpage guidelines encouraged the use of \- for flags because > they rendered nicely in printed documents with no harm done to nroff > output. They did that despite the obvious fact that the manpage is > there to describe what to type, and basically no one can type the > denoted character. > > Then Unicode pronounced that 0x2d was neither fish nor fowl, > and gave us hyphen, minus, and endash characters. groff dutifully > mapped - onto hyphen \- onto minus. But when terminals gained Unicode > capability, some of them lost cut-and-paste convenience. So far, that is maybe somewhat simplified, but more or less to the point. For details of early runoff/roff history, see http://manpages.bsd.lv/history.html Basically, you are starting your narrative in 1973. At that point, the language was about nine years old and had seen at least ten earlier implementations in about eight different programming languages on about five different operating systems on about eight different machines by at least ten different authors. Finding out how all those handled "-" when they were young might be non-trivial. > The debate is over how to recover that convenience. No, it is not. That was solved long ago, at the latest here: commit 98acc924f4e32cfc2209df5db0c21921df8cc7ac Author: Werner LEMBERG Date: Fri Jan 2 23:16:20 2009 + * tmac/an-old.tmac, tmac/doc.tmac: For -Tutf8, map \-, -, ', and ` conservatively to ASCII for the sake of easy cut and paste. The debate is over three different topics: 1. Cut and paste from -Tps, -Tpdf, and -Thtml. 2. What to use if ASCII HYPHEN-MINUS is desired in the output, both in manual pages and in other documents. 3. What to use if a mathematical minus sign is desired in the output. > Oddly, my system doesn't exibit any cut-and-paste anomaly despite > using xterm with the "-en UTF-8" option. Searching for - in less > also works. Yes, due to Werner's change in 2009 quoted above. One of the effects is that in manual pages, "-" and "\-" in the input always render as U+002D HYPHEN-MINUS in -Tutf8. > If it's a UI issue we're confronting, perhaps it's really up to the UI > to deal with. The man utility can certainly impose on nroff the > requirement that - and \- both render as 0x2d. Then it shows up > correctly in the pager. It is visually acceptable to the user, and > DTRT regarding the UI. (Maybe that's what Ubuntu LTS does for me; I > don't know.) It's not Ubuntu, it's groff itself already doing that for you. > It's not obvious to me groff should make any change at all. At most, > reverting the mapping of - so that it outputs 0x2d again would undo a > nonobvious, subtle change in favor of simplicity. Probably not, because that would break each and every existing non-manpage roff document. Besides, i just noticed that it's completely unclear what "output U+002D HYPHEN-MINUS in a PostScript or PDF document" is even supposed to mean, see my other mail... Yours, Ingo