Re: [Groff] : ASCII Minus Sign in man Pages

2017-05-03 Thread Ralph Corderoy
Hi Doug,

> Originally \(pl and \(mi came from a fixed font (S) while + and \-
> came from the current font.

That matches CSTR 54 which has

\-Minus sign in the current font

in the table near the beginning and \(pl and \(mi as `Special Character
Names' on the last page.

> As I understand your comment, groff has reversed this troff
> convention. Additionally groff interprets - as a compromise
> HYPHEN-MINUS.

It's not that simple.  :-)  Here's groff 1.22.3-7 with a UTF-8 terminal.

$ nroff <<<'- \- \(mi + \(pl' | tr -d ' \n' | recode ..dump
UCS2   Mne   Description

2010   -1hyphen
2212   -2minus sign
2212   -2minus sign
002B   + plus sign
002B   + plus sign
$ 

$ troff -Tutf8 <<<'- \- \(mi + \(pl' | egrep 'font|^[Ct]'
x font 1 R
Chy
C\-
Cmi
t+
Cpl
$

nroff never gave U+002D, an ASCII minus.  This is a problem for a man
page wanting text with an ASCII minus that can be cut and pasted to sh.
Because Unicode has reached the TTY, and PDF can be viewed as pixels,
we've migrated away from the many-usage ASCII minus to the other
specific, more typographic, runes.

The PostScript from groff is

/F0 10 /Times-Roman@0 SF
2.5<2dad>72 12 SPostScript names: 2d=hyphen  ad=softhyphen
/F1 10 /Symbol SF
(-) A
F0
(+) 2.5 E
F1
(+) 2.5 E
0 Cg EP

So troff's characters map onto these PostScript fonts and characters.

- Times-Roman  hyphen
\-Times-Roman  softhyphen
\(mi  Symbol   hyphen
+ Times-Roman  plus
\(pl  Symbol   plus

All five look distinct here in gv(1) and match your description;  \(mi
and \(pl are the Symbol font, \- and + are the current font.

As a solution, Ingo made the suggestion to switch \- to always be ASCII
minus because we thought \(mi was another name for \- and so still
available for the original use of a mathematical minus sign.  It's spelt
out half-way through
https://lists.gnu.org/archive/html/groff/2017-04/msg00052.html starting
"That leads to a natural suggestion solving *both* of these problems".

But you've thrown a spanner by pointing out \- and \(mi are not
equivalent.  :-)

Ingo, I think this means we need to pause on the switching of \- to
always be ASCII minus.

> man groff_char, however, tells the original state of affairs.  What is
> one to believe?

groff_char(7) is correct.

Output  Input  PostScript  Unicode  Notes
+   \[pl]  plusu002Bplus in special font
−   \[mi]  minus   u2212minus in special font

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy



Re: [Groff] Macro "itc" is needed to make escape "\c" useful

2017-05-03 Thread Ralph Corderoy
Hi,

Ingo wrote:
> Given that the man(7) .TP .itc hack got committed to groff
...
> Of course, i still don't recommend actually using it, because that
> would make your manual page misrender on groff <= 1.22.3, on mandoc <=
> 1.14.1, and on any version of anything else.

This is sad news.  It's an insufficient improvement due to one man's
dislike of inline \f.  It shouldn't be used.  Hopefully groff's
documentation points out in all cases that it's incompatible?

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy



Re: [Groff] ASCII Minus Sign in man Pages

2017-05-03 Thread G. Branden Robinson
At 2017-05-02T21:29:39-0400, Doug McIlroy wrote:
> I was previously told that \(mi is the true minus sign. But the
> true minus sign, at least in my mind, must come from the current
> font, so that it comes out right wherever it occurs, even in a
> bold headline like "Fairbanks shivers at -50".

I agree.

> I'll buy Branden's  first assertion, but if + and \- come from the
> current font as they originally did, and \(pl and \(mi come
> from the the current font per the previous paragraph, they
> become redundant.

It wouldn't be the first redundancy in the character escapes:

From groff(7):


   \´ The acute accent ´; same as \(aa.  Unescaped: apostrophe,
  right quotation mark, single quote (ASCII 0x27).
   \` The grave accent `; same as \(ga.  Unescaped: left quote,
  backquote (ASCII 0x60).
[...]
   \_ The same as \(ul, the underline character.

I want to remove some of that overload encouragement in the descriptions
of the unescaped results above, because we have (long had) \[cq] and
\[oq] for single quotation marks, but that's another discussion.

> So I remain confused.

I think it's a confusing issue.  We didn't have Unicode back in the days
of CSTR #54, so the idea that you could get a pile of
mathematically-oriented glyphs out of the same font that you had loaded
to print your running prose was unheard of.

A quick experiment with -Z shows me that groff does still today load the
S [special] font when the \(pl and \(mi character escapes are used.

On my UTF-8 device, of course, this is a no-op.

It's not a no-op on a PostScript device, but I note no _visual_
difference.

/F0 10/Times-Roman@0 SF 196.51(foo\(1\) quux foo\(1\))72 48 R(plus +)108
84 Q(mathplus)108 100.8 Q/F1 10/Symbol SF(+)2.5 E F0(minus \255)108 
117.6 Q(mathminus)108 134.4 Q F1(-)2.5 E F0 211.235(baz bar)72 768 R(1) 
222.615 E 0 Cg EP

On what devices do we expect a visual difference?

Regards,
Branden


signature.asc
Description: PGP signature


Re: [Groff] ASCII Minus Sign in man Pages

2017-05-03 Thread Ralph Corderoy
Hi Branden,

> A quick experiment with -Z shows me that groff does still today load
> the S [special] font when the \(pl and \(mi character escapes are
> used.

Yes, my list email from earlier today lists the PostScript glyphs:
https://lists.gnu.org/archive/html/groff/2017-05/msg00028.html

> It's not a no-op on a PostScript device, but I note no _visual_
> difference.

That's odd.  It's a noticeable difference here, even sticking with the
default Roman.  Also, + and \(pl vary similarly.

$ cat minus.tr
.ds b \(rs
.ds m \N'45'
\*m -. \*bN'45' \N'45'. \*b\*m \-. \*b(mi \(mi. + +. \*b(pl \(pl.
$ groff minus.tr >minus.ps
$ gs -q -r600 -sDEVICE=pnmraw -sOutputFile=- \
> -dTextAlphaBits=4 -  pnmcrop -quiet |
> pnmmargin -white 10 |
> pnmtopng -quiet -compression 9 >minus.png
$

Visible at https://s29.postimg.org/ddwg1okz9/minus.png

The PostScript is using Symbol for \(mi and \(pl.

/F0 10/Times-Roman@0 SF
2.5 (--)72 12 S
2.5(.\\)-2.5 G
(N'45' -.  \\- \255.  \\\(mi)-2.5 E
/F1 10/Symbol SF
(-)2.5 E
F0
2.5(.++)C
2.5(.\\)-2.5 G
(\(pl)-2.5 E
F1
(+)2.5 E
F0
(.)A 0

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy



Re: [Groff] ASCII Minus Sign in man Pages

2017-05-03 Thread Mike Bianchi
Folk,

I've been sort of watching from the sidelines here, but am going to toss in my
2 cents.

First, I once heard  troff/groff  described as the assembly language of type
setting.  So to my mind it should be "simple" (as in not too complicated) and
stable.  The first goal is forever lost.

Stable, to me, implies not changing much over time, and most changes
should be backward compatible.  troff/groff has by and large met that test.
Having mastered troff at one time the stability has saved me.  But my mastery
has degraded as I have not kept up with all the improvements and never was a
grand master.

Backward compatible means that all code written to the existing definitions
should turn out the same results as in the past when submitted to new
assemblers.
(I have nroff documents and C code from the 1970s that still work.)

Thus when we have pieces of documented definitions that contradict each other
the problem becomes which definition to change.  The definitions for

-   \-   \(mi   \(hy   \(em   \(en   (others?)

should be clear and the implementations should implement them as defined.
To my mind  -  in groff should always default to the ASCII, 7-bit,
undistinguished character.

When we have assemblers that contradict because of the documentation being
inconsistent, what do we do about that?  For me, I want the assembler I use,
groff, to match the corrected documentation.

If different assemblers knowingly disagree with each other it would be a
courtesy to the community to document that fact.  (Witness the documentation
for many of the Linux/Unix/BSD implementations of "the shell".)

So if the current definitions for  -  \-  \(hy  disagree with historical
documents and implementations, they should be documented.
If I am writing at the assembly level, I can always
.char - \-


Given those opinions, I feel it is for the macro packages, the "compilers",
to implement the necessary features such as associating true minus-signs
with numbers and true hyphens with word separators.  And if  -x  is meant to be
keyboard (7-bit ASCII) characters, the compiler should make that so.

The unfortunate history is that the man pages and other ancient documents come
from a time when the users of macros where expected to dive into the assembly
language _frequently_ to get-around the things that the macros just did not
address.  And that history is still with us in WYSIWYG (What You See Is What
You Get) word processors.  Want that  -  to be a minus in WYSIWYG?  Dive into
the font table and pick out the character there, if you can find it.

My impression is that some macros, such as Schaffter's Mom, go a long way
towards eliminating the assembly get-arounds.  Still macros take a programmers
view of documentation, namely to compile our document source code rather than
format the WYSIWYG input.  Their advantage is that simple "commands" crank out
a lot of assembler code.  Calling something a TITLE implies a lot of specifics.

All that said, the concept of having the complier decide whether a character
should be a minus, hyphen, minus-hyphen, UTF8-something-or-other, etc. should
be in the realm of a higher level component than troff/groff.

And the fix for old documents, such as the man pages that depend on groff
for their appearance, is to edit their source code so their specifics match
the (corrected?) groff definitions.
Mike


On Tue, May 02, 2017 at 09:29:39PM -0400, Doug McIlroy wrote:
> 
> Branden wrote
> 
> Ingo's proposal would not mandate that + and \- come from the special
> font.
> 
> It also would not mandate that \(pl and \(mi come from the current font.
> 
> 
> --
> 
> I was previously told that \(mi is the true minus sign. But the
> true minus sign, at least in my mind, must come from the current
> font, so that it comes out right wherever it occurs, even in a
> bold headline like "Fairbanks shivers at -50".
> 
> 
> I'll buy Branden's  first assertion, but if + and \- come from the
> current font as they originally did, and \(pl and \(mi come
> from the the current font per the previous paragraph, they
> become redundant.
> 
> So I remain confused.
> 
> Doug
 

-- 
 Mike Bianchi
 Foveal Systems

 973 822-2085

 mbian...@foveal.com
 http://www.AutoAuditorium.com
 http://www.FovealMounts.com



Re: [Groff] ASCII Minus Sign in man Pages

2017-05-03 Thread Ralph Corderoy
Hi Mike,

> Stable, to me, implies not changing much over time, and most changes
> should be backward compatible.
...
> Backward compatible means that all code written to the existing
> definitions should turn out the same results as in the past when
> submitted to new assemblers.  (I have nroff documents and C code from
> the 1970s that still work.)

Agreed.

> Thus when we have pieces of documented definitions that contradict
> each other the problem becomes which definition to change.  The
> definitions for
>
>   -   \-   \(mi   \(hy   \(em   \(en   (others?)

\N'45' ?

-   A hyphen for text, e.g. beer-flavoured ice-cream.
\-  A minus sign in the current font.
\(miA minus sign in the special font.
\(hyAnother name for plain `-', so a hyphen for text.
\N'45'  Glyph 45 in the current font.

> To my mind  -  in groff should always default to the ASCII, 7-bit,
> undistinguished character.

But it's always meant hyphen in pre-groff troff because it's a lot more
common to want a hyphen in writing than a minus sign.  Then Unicode
decided ASCII minus had too many meanings and couldn't be used for any
of them so created U+2010 for hyphen, and U+2212 for minus sign, and
groff switched to producing those for the hyphen and minus sign, leaving
ASCII minus unreproducible apart from \N'45'.

man pages that were and are considered correctly written have used \-
for a command-line minus sign, e.g. `wc \-l'.  (Incorrect man pages that
wrote `wc -l' can be ignored for the discussion;  there seems to be the
will to fix them.)  For that to paste from a man page, viewed as UTF-8
TTY, PostScript, PDF, browser, ..., it needs to be character 45.
Writing «wc \N'45'l» isn't going to gain support.  :-)  How to produce
it is the issue.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy



Re: [Groff] ASCII Minus Sign in man Pages

2017-05-03 Thread Steffen Nurpmeso
Mike Bianchi  wrote:

I absolutely and completely support this opinion of yours.

Maybe except that it would be nice if some future user could
easily find some documentation and now what to do to get "nice"r
looking output, maybe with some command line argument (variable),
or a configuration file, because all those misuses came in for
a reason, and that cannot simply be neglected, after all.

--steffen
|Ralph says i must not use signatures which spread the light!



Re: [Groff] Critique this bold-italic private macro for man pages

2017-05-03 Thread Carsten Kunze
> "G. Branden Robinson"  hat am 3. Mai 2017 um 
> 01:02 geschrieben:
> 
> The .itc request is a groff extension so an additional layer of
> 
> .ie \(.g
> 
> could be added.

Where do you want to add this--in the macro package?  This would not be 
necessary, since it is already groff's own package.  We don't need to care for 
those who steal it ;)

I saw testing for \(.g in manpages--this is a bad idea.  Manpages itself need 
to be written portable, without testing for this or that formatter.

Carsten



Re: [Groff] Critique this bold-italic private macro for man pages

2017-05-03 Thread G. Branden Robinson
At 2017-05-03T17:24:41+0200, Carsten Kunze wrote:
> > "G. Branden Robinson"  hat am 3. Mai 2017 um 
> > 01:02 geschrieben:
> > 
> > The .itc request is a groff extension so an additional layer of
> > 
> > .ie \(.g
> > 
> > could be added.
> 
> Where do you want to add this--in the macro package?

Nope.  By "private macro" I mean one defined and used only within one
document.

> I saw testing for \(.g in manpages--this is a bad idea.  Manpages
> itself need to be written portable, without testing for this or that
> formatter.

Most \n(.g tests I've seen in man pages are to try to _achieve_
portability, not break it.

E.g., ncurses uses these conditionals in many of its pages:

.ie \n(.g .ds `` \(lq
.el   .ds `` ``
.ie \n(.g .ds '' \(rq
.el   .ds '' ''

Regards,
Branden


signature.asc
Description: PGP signature


Re: [Groff] ASCII Minus Sign in man Pages

2017-05-03 Thread John Gardner
>
> For that to paste from a man page, viewed as UTF-8 TTY,


Erm, I may be missing something, here... but if monospaced hyphens and
minus signs are optically indistinguishable, what's the worth in
differentiating between either?

IMHO, if any change is to be made, it should be with grotty's handling of \-.
A new escape sequence (or command-line switch) could always be added  for
authors/users who wish for a \- to *always* be rendered as U+2212, even for
Unicode-enabled terminals. Possible example might be character listings
(e.g., groff_char(7), or for Unicode-related documentation).

Of course, that still wouldn't do anything for code-blocks in PDFs. Then
again, I wouldn't be copying code from a PDF without expecting to clean it
up after pasting, anyway...

(I'm sure this suggestion is sounding silly to somebody...)



On 4 May 2017 at 00:51, Ralph Corderoy  wrote:

> Hi Mike,
>
> > Stable, to me, implies not changing much over time, and most changes
> > should be backward compatible.
> ...
> > Backward compatible means that all code written to the existing
> > definitions should turn out the same results as in the past when
> > submitted to new assemblers.  (I have nroff documents and C code from
> > the 1970s that still work.)
>
> Agreed.
>
> > Thus when we have pieces of documented definitions that contradict
> > each other the problem becomes which definition to change.  The
> > definitions for
> >
> >   -   \-   \(mi   \(hy   \(em   \(en   (others?)
>
> \N'45' ?
>
> -   A hyphen for text, e.g. beer-flavoured ice-cream.
> \-  A minus sign in the current font.
> \(miA minus sign in the special font.
> \(hyAnother name for plain `-', so a hyphen for text.
> \N'45'  Glyph 45 in the current font.
>
> > To my mind  -  in groff should always default to the ASCII, 7-bit,
> > undistinguished character.
>
> But it's always meant hyphen in pre-groff troff because it's a lot more
> common to want a hyphen in writing than a minus sign.  Then Unicode
> decided ASCII minus had too many meanings and couldn't be used for any
> of them so created U+2010 for hyphen, and U+2212 for minus sign, and
> groff switched to producing those for the hyphen and minus sign, leaving
> ASCII minus unreproducible apart from \N'45'.
>
> man pages that were and are considered correctly written have used \-
> for a command-line minus sign, e.g. `wc \-l'.  (Incorrect man pages that
> wrote `wc -l' can be ignored for the discussion;  there seems to be the
> will to fix them.)  For that to paste from a man page, viewed as UTF-8
> TTY, PostScript, PDF, browser, ..., it needs to be character 45.
> Writing «wc \N'45'l» isn't going to gain support.  :-)  How to produce
> it is the issue.
>
> --
> Cheers, Ralph.
> https://plus.google.com/+RalphCorderoy
>
>


Re: [Groff] ASCII Minus Sign in man Pages

2017-05-03 Thread Mike Bianchi
On Wed, May 03, 2017 at 03:51:24PM +0100, Ralph Corderoy wrote:
>   : 
>   : 
> -   A hyphen for text, e.g. beer-flavoured ice-cream.
>   : 
> > To my mind  -  in groff should always default to the ASCII, 7-bit,
> > undistinguished character.
> 
> But it's always meant hyphen in pre-groff troff because it's a lot more
> common to want a hyphen in writing than a minus sign.

_I_ would claim this interpretation was a mistake.  ((My opinion only here.))
The  -  character exists on all keyboards.  It is not labeled minus or hyphen
or endash.  It generates the decimal 45 (hex 0x2D, octal 055) character.
That any *roff processor would give it a different meaning is most unfortunate.
Especially because hyphenation is a built in feature of *roff and once there
was the concept of  \(hy , hyphenated words should have used it.
Note please that I am not saying that  -  should be interpreted as  \(mi
either.


>   : 
> \-  A minus sign in the current font.
> \(miA minus sign in the special font.

I would claim that  \-  makes sense, but  \(mi  coming from the special font
is a hold-over from the _first_ troff at Bell Labs that was tailored to the
first support photo typesetter that supported 4 102-character fonts.  They
were Roman, Bold, Italic, and Special (R B I S).  Special was the Greek
alphabet and other needed characters.
Us old-timers fondly remember the Bell System bell.
See
https://en.wikipedia.org/wiki/Troff "CAT phototypesetter"
https://en.wikipedia.org/wiki/CAT_%28phototypesetter%29

((And interestingly, the current S (Symbol) font also contains the numbers,
presumably so the they and the arithmetic and logic operators could all look
alike in mathematical writing.
I'm guessing that *roff does not take the digits from the Symbol font by
default.  I think that as an effective argument for not making \(mi draw from
the S font by default.))


> \-  A minus sign in the current font.
> \(miA minus sign in the special font.
> \(hyAnother name for plain `-', so a hyphen for text.
> \N'45'  Glyph 45 in the current font.

Once fonts distinguished between minus and hyphen with distinctive glyphs
then  \(mi  and  \(hy  have should come from the current font, especially if
neither is  \N'45' .

BUT that is MY opinion.  What I am pushing for is that all the groff
documentation speak truth on this matter.


> ... paste from a man page, viewed as UTF-8
> TTY, PostScript, PDF, browser, ..., it needs to be character 45.
> Writing «wc \N'45'l» isn't going to gain support.  :-)
> How to produce it is the issue.

Absolutely.  I propose
wc -l

if  -  was  \N'45'  It would make sense for future generations.  As a first
generation UNIX citizen it is interesting to contemplate how much longer the
man pages groff documents will be relevant.


-- 
 Mike Bianchi
 Foveal Systems

 973 822-2085

 mbian...@foveal.com
 http://www.AutoAuditorium.com
 http://www.FovealMounts.com



Re: [Groff] Critique this bold-italic private macro for man pages

2017-05-03 Thread Carsten Kunze
> "G. Branden Robinson"  hat am 3. Mai 2017 um 
> 17:30 geschrieben:
> 
> Nope.  By "private macro" I mean one defined and used only within one
> document.

A manpage is "one document". Or what do you refer to?

> Most \n(.g tests I've seen in man pages are to try to _achieve_
> portability, not break it.
> 
> E.g., ncurses uses these conditionals in many of its pages:
> 
> .ie \n(.g .ds `` \(lq
> .el   .ds `` ``
> .ie \n(.g .ds '' \(rq
> .el   .ds '' ''

So all formatters except groff are only allowed to have second class output?  
Exactly because of this poor code Heirloom sets .g to 1.  I assume also 
mandoc(1) reads \(.g as 1.  Had this been the intention behind register .g in 
groff?  So when all relevant tools set .g to 1, what is the point of this 
register? ;)

Three tools know of these special characters.  Either change other tools to 
support groff_char(7), groff_man(7) and groff_mdoc(7) or replace them with with 
one of the three tools.



Re: [Groff] Critique this bold-italic private macro for man pages

2017-05-03 Thread Carsten Kunze
> Carsten Kunze  hat am 3. Mai 2017 um 21:37 
> geschrieben:
> 
> > E.g., ncurses uses these conditionals in many of its pages:
> > 
> > .ie \n(.g .ds `` \(lq
> > .el   .ds `` ``
> > .ie \n(.g .ds '' \(rq
> > .el   .ds '' ''

I overlooked the word ncurses...

Ok, there had been days when this distinction did made sense.  But it should 
not be used in new code anymore...

Carsten



Re: [Groff] Critique this bold-italic private macro for man pages

2017-05-03 Thread John Gardner
Is there literally no way to identify when a modern (non-GNU) troff is
being used?

On 4 May 2017 at 05:46, Carsten Kunze  wrote:

> > Carsten Kunze  hat am 3. Mai 2017 um 21:37
> geschrieben:
> >
> > > E.g., ncurses uses these conditionals in many of its pages:
> > >
> > > .ie \n(.g .ds `` \(lq
> > > .el   .ds `` ``
> > > .ie \n(.g .ds '' \(rq
> > > .el   .ds '' ''
>
> I overlooked the word ncurses...
>
> Ok, there had been days when this distinction did made sense.  But it
> should not be used in new code anymore...
>
> Carsten
>
>


Re: [Groff] Critique this bold-italic private macro for man pages

2017-05-03 Thread Carsten Kunze
> John Gardner  hat am 3. Mai 2017 um 21:55 geschrieben:
> 
> 
> Is there literally no way to identify when a modern (non-GNU) troff is
> being used?

General typesetting is something else.  Heirloom has this kludge only for 
manpages, neatroff (AFAIK not used for manpages) likely does not set .g.

There are ways to detect the formatter but a manpage must not do this.  IMHO a 
manpage should suppose that groff is used.  If groff has bugs (e.g. compared to 
mandoc(1)) they should be fixed.

Carsten



Re: [Groff] Critique this bold-italic private macro for man pages

2017-05-03 Thread Carsten Kunze
> "G. Branden Robinson"  hat am 3. Mai 2017 um 
> 22:47 geschrieben:
> 
> So ncurses should be gating on the definition of the glyph rather than
> on whether groff is the typesetter, right?
> 
> .ie c \(lq .ds `` \(lq
> .el.ds `` ``
> .ie c \(rq .ds '' \(rq
> .el.ds '' ''
> 
> What do you think?

Short answer:  We should differ manpages and other typesetting.  For general 
typesetting a document is created for one special tool.  If not, if it is e.g. 
a macro package like -mom, it need to detect the tool to make use of the tools 
special features.  There are severe differences (not in general but regarding 
special features) between groff, Heirloom and neatroff.  If you e.g. intent to 
write a book, it may be better to choose on of these tools and then write a 
document using all powerful features, which consequently is not portable--but 
this is ok.

Manpages should use the -man or -mdoc macros itself with as few as possible low 
level requests or escapes.  -mdoc should not need low level *roff elements at 
all.  As Ingo said, there are three major tools.  Most system IMHO use groff, 
more and more are using mandoc(1).  If there is an ancient system with an old 
*roff tool, this tool should be replaced.

To answer your question:  Simply use \(lq etc. (in manpages) and assume the 
tool supports it.

Carsten



Re: [Groff] Critique this bold-italic private macro for man pages

2017-05-03 Thread Ingo Schwarze
Hi,

Carsten Kunze Heirloom wrote on Wed, May 03, 2017 at 09:37:21PM +0200:

> I assume also mandoc(1) reads \(.g as 1.

Yes:

 $ echo '\\n(.g' | mandoc | sed -n 5p   
1

 $ less /co/mdocml/roff.c
int
roff_getreg(const struct roff *r, const char *name)
{
int val;

if ('.' == name[0] && '\0' != name[1] && '\0' == name[2]) {
val = roff_getregro(r, name + 1);
if (-1 != val)
return val;
}
/* ... handle read-write registers ... */
}

/*
 * Handle some predefined read-only number registers.
 * ...
 */
static int
roff_getregro(const struct roff *r, const char *name)
{
switch (*name) {
case 'g':  /* Groff compatibility mode is always on. */
return 1;
/* ... handle some other read-only registers ... */
}
}



It's a pervasive pattern:  When writing portable software (or in
this case, portable documentation), and when trying to figure out
whether the platform we are currently trying to run on supports a
given feature we wish to use, *never* test for software or operating
system names or for version numbers.  That plainly doesn't work in
practice: If you test for names, the functionality of those programs
or systems will change over time, and if you test for version
numbers, you will run into implementations you never considered
(and hence don't know any version numbers for) but that still support
the feature, or you run into implementations that do have version
numbers, but where they mean something completely unexpected.

For software, write feature tests instead and configure your software
accordingly for compilation.

For manual pages, do not write any test code at all.  If you put
your feature tests into the manual pages themselves, they are not
only exceedingly ugly, but often a bigger portability problem than
the features you are trying to test for in the first place.  If you
keep the feature tests separate and try to automatically edit the
manual page source code accordingly before installing the manual
pages, that usually turns into a maintenance nightmare.  Manual
pages just don't need the compilation step that program code needs,
and inventing one just for autoconfiguration purposes is seriously
going over the top.

If writing man(7), stick to the lowest common denominator features
that are supported by practically everything.

If writing mdoc(7), write it for the modern standard as documented
in groff_mdoc(7) and mandoc mdoc(7).  Heirloom copes with that as
well, and legacy implementations are virtually inexistent.

Yours,
  Ingo



Re: [Groff] Critique this bold-italic private macro for man pages

2017-05-03 Thread Ingo Schwarze
Hi Branden,

> .ie c \(lq .ds `` \(lq
> .el.ds `` ``
> .ie c \(rq .ds '' \(rq
> .el.ds '' ''
> 
> What do you think?

If doesn't work:

 $ uname -a
SunOS unstable11s 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise
 $ cat tmp.roff
.ie c \(lq .ds `` \(lq
.el.ds `` ``
.ie c \(rq .ds '' \(rq
.el.ds '' ''
>>>\*(``hello world!\*(''<<<
 $ nroff tmp.roff
>>>hello world!<<<
 $ troff tmp.roff | /usr/lib/lp/postscript/dpost | sed -n '/>>>/,/<<>>)720 120 w
10 R f
(hello world!)1 499 1 888 120 t
10 S1 f
(<<<)1387 120 w

 $ nroff
>>> 
.ie c \(mi defined
.el undefined
<<<
^D
>>> <<<
 $ echo '>>>\(lqhello world!\(rq<<<' | nroff
>>>hello world!<<<

There are real-world systems (sold today) where neither \(lq
nor the 'c' conditional is supported.  And yes, that kind
of nroff may be used for manual page display by default:

 $ strings `which man` | grep roff
/usr/lib/sgml/sgml2roff
troff
lp -c -T troff
nroff -u0 -Tlp

Yours,
  Ingo



Re: [Groff] Critique this bold-italic private macro for man pages

2017-05-03 Thread James K. Lowden
On Wed, 3 May 2017 22:06:10 +0200 (CEST)
Carsten Kunze  wrote:

> There are ways to detect the formatter but a manpage must not do
> this.  

Why not?  ISTM we'd have better manpages if they weren't constrained to
the rendering capability of a VT-100 terminal.   For example, equations
or pictures could augment the text, or replace some of it, when
"printed".  

--jkl



Re: [Groff] ASCII Minus Sign in man Pages

2017-05-03 Thread James K. Lowden
On Wed, 3 May 2017 13:42:55 -0400
Mike Bianchi  wrote:

> The  -  character exists on all keyboards.  It is not labeled minus
> or hyphen or endash.  It generates the decimal 45 (hex 0x2D, octal
> 055) character. That any *roff processor would give it a different
> meaning is most unfortunate.

IMO that is the principal that should be applied: every unadorned
character appearing in troff input should represent itself.  If  you
want something other than that, groff_char(7) describes your options.  

IIUC, this debate about how to render - and \- stems from a conflict in
historical practice.  Is the following correct?  

When troff was young, terminals were ascii and the - character
was 0x2d.  Manpage guidelines encouraged the use of \- for flags because
they rendered nicely in printed documents with no harm done to nroff
output.  They did that despite the obvious fact that the manpage is
there to describe what to type, and basically no one can type the
denoted character.  

Then Unicode pronounced that 0x2d was neither fish nor fowl,
and gave us hyphen, minus, and endash characters.  groff dutifully
mapped - onto hyphen \- onto minus.  But when terminals gained Unicode
capability, some of them lost cut-and-paste convenience.  The debate is
over how to recover that convenience.  

Oddly, my system doesn't exibit any cut-and-paste anomaly despite
using xterm with the "-en UTF-8" option.  Searching for - in less also
works.

If it's a UI issue we're confronting, perhaps it's really up to the UI
to deal with.  The man utility can certainly impose on nroff the
requirement that - and \- both render as 0x2d.  Then it shows up
correctly in the pager.  It is visually acceptable to the user, and
DTRT regarding the UI.  (Maybe that's what Ubuntu LTS does for me; I
don't know.) 

It's not obvious to me groff should make any change at all.  At most,
reverting the mapping of - so that it outputs 0x2d again would undo a
nonobvious, subtle change in favor of simplicity.  

Possibly some degree of outreach to the UI community would be service,
too.  

--jkl











Re: [Groff] Critique this bold-italic private macro for man pages

2017-05-03 Thread G. Branden Robinson
At 2017-05-04T01:04:48+0200, Ingo Schwarze wrote:
> Hi Branden,
> 
> > .ie c \(lq .ds `` \(lq
> > .el.ds `` ``
> > .ie c \(rq .ds '' \(rq
> > .el.ds '' ''
> > 
> > What do you think?
> 
> If doesn't work:
> 
>  $ uname -a
> SunOS unstable11s 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise
>  $ cat tmp.roff
> .ie c \(lq .ds `` \(lq
> .el.ds `` ``
> .ie c \(rq .ds '' \(rq
> .el.ds '' ''
> >>>\*(``hello world!\*(''<<<
>  $ nroff tmp.roff
> >>>hello world!<<<
>  $ troff tmp.roff | /usr/lib/lp/postscript/dpost | sed -n '/>>>/,/<< (>>>)720 120 w
> 10 R f
> (hello world!)1 499 1 888 120 t
> 10 S1 f
> (<<<)1387 120 w
> 
>  $ nroff
> >>> 
> .ie c \(mi defined
> .el undefined
> <<<
> ^D
> >>> <<<
>  $ echo '>>>\(lqhello world!\(rq<<<' | nroff
> >>>hello world!<<<
> 
> There are real-world systems (sold today) where neither \(lq
> nor the 'c' conditional is supported.  And yes, that kind
> of nroff may be used for manual page display by default:
> 
>  $ strings `which man` | grep roff
> /usr/lib/sgml/sgml2roff
> troff
> lp -c -T troff
> nroff -u0 -Tlp

N.B. my comments here are situated in a context _outside_ of the GNU
Troff project.

Why do my man pages need to be more portable the shell scripts or C code
I ship with them?

What is the value in reading the _formatted_ version of a man page for a
tool that won't compile or run correctly on the host?

I refuse to write shell scripts for general-purpose consumption only in
the historical Bourne dialect that Solaris /bin/sh was, and I refuse to
write man pages for general-purpose consumption only in some minimal
common subset that _no one_ has troubled themselves to carefully define.

As I've said before, I fear the talk of "safe subsets" and portability
in groff_man(7) and man(7) are years out of date and in places
anticipated a golden age of direct man-to-HTML rendering that never came
to pass for several reasons.

I want the "man language" to be small, as I also said before, but not
cripplingly so, and I want it to be informed by design, not accidental
overlaps in Venn diagrams of traits supported by unmaintained
implementations.

For example, even if that old Solaris system can render the output of
docbook-to-man legibly, that wouldn't mean that docbook-to-man's output
is well-considered or a model to emulate in any particulars.

Finally, Groff's own portability is sufficiently broad that we should be
able to say, "as a rule, if you want man pages written in the past 25
years to render nicely on Boozix 11.0, Gizmoware's custom hybrid of
4.3BSD and SVr3, please build and install Groff so that its macro
packages can achieve this for you."

Regards,
Branden


signature.asc
Description: PGP signature


Re: [Groff] Critique this bold-italic private macro for man pages

2017-05-03 Thread G. Branden Robinson
At 2017-05-03T20:13:29-0400, James K. Lowden wrote:
> On Wed, 3 May 2017 22:06:10 +0200 (CEST)
> Carsten Kunze  wrote:
> 
> > There are ways to detect the formatter but a manpage must not do
> > this.  
> 
> Why not?  ISTM we'd have better manpages if they weren't constrained
> to the rendering capability of a VT-100 terminal.

Oh, it can get much worse than that.  If we were to tell man page
writers  target only the common subset of features implemented by
everything that's claimed to be VT-100-compatible over the years, we
would be left only with that which devascii offers, and even that is
probably too sophisticated.

> For example, equations or pictures could augment the text, or replace
> some of it, when "printed".

Yes.  For the past several years, xterm has supported DEC's ReGIS[1] and
Sixel[2] graphics.

Note that when people call something "VT-100-compatible", they generally
mean some smorgasbord of features cribbed from the VT100 and its many
successor terminals (VT220, VT320, VT420, VT520, VT525), whatever seemed
cool and/or was easy to implement.

At the same time, the finer details of getting baseline VT100 emulation
completely correct with respect to, say, ACS ("alternate character set")
and cursor movement were frequently neglected.

[1] https://www.youtube.com/watch?v=Dmrmj5y72kg

[2] https://upload.wikimedia.org/wikipedia/commons/6/62/W3m-wikipedia.png

Regards,
Branden


signature.asc
Description: PGP signature


Re: [Groff] ASCII Minus Sign in man Pages

2017-05-03 Thread Ingo Schwarze
Hi Ralph,

Ralph Corderoy wrote on Wed, May 03, 2017 at 03:51:24PM +0100:

> -   A hyphen for text, e.g. beer-flavoured ice-cream.
> \-  A minus sign in the current font.
> \(miA minus sign in the special font.
> \(hyAnother name for plain `-', so a hyphen for text.
> \N'45'  Glyph 45 in the current font.

The trouble with \N'45' is that it has not a fixed meaning
and that the resulting glyph varies wildly.

Even if you only look at groff and only at git master HEAD,
\N'45' means:

 -Tascii:   U+002D HYPHEN-MINUS
 -Tlatin1:  U+002D HYPHEN-MINUS
 -Tutf8:U+002D HYPHEN-MINUS
 -Thtml:U+002D HYPHEN-MINUS
 -Tcp1047:  nothing useful, a control character (ENQ, enquiry character)
 -Tps:  hyphen; that's the same character as \(hy
 -Tpdf: hyphen; that's the same character as \(hy
 -Tdvi: hyphen; that's the same character as \(hy
 -Tlbp: hyphen; that's the same character as \(hy
 -Tlj4: undefined, no character at all

While this is clearly what you want for -Tascii, -Tlatin1, -Tutf8,
and -Thtml, and in particular for -Tutf8 where it is a reasonably
wide glyph looking like a minus sign and also the right character
for copy and paste, it is dubious whether \N'45' is what you want
for -Tps, -Tpdf, -Tdvi, and -Tlbp.

First, the fact that the character number agrees with ASCII doesn't
mean much.  While the arrangement of the glyphs for the TR font of
the -Tps device is loosely based on ASCII, the codepoints for several
characters mismatch.  For example, U+0027 APOSTROPHE, which you get
with \(aq, is codepoint 8 in -Tps TR, *not* codepoint 39 as you
would expect, which is instead \(cq = U+2019 RIGHT SINGLE QUOTATION
MARK.  U+0060 GRAVE ACCENT is codepoint 146, not 96, which is instead
\(oq = U+2018 LEFT SINGLE QUOTATION MARK.  U+005E CIRCUMFLEX ACCENT
has two associated codepoints, the expected 94 (produced with ^)
and the unexpected 0 (\(ha).  So has U+007E TILDE, the expected 126
(~) and the unexpected 1 (\(ti).  It is also amusing that in the
appendix below, the character \(en transforms to seven different
glyph numbers in nine different output devices, and only one of
them is related to ASCII, which emphasizes the weak relationship
between glyph numbers (even for -Tps) and ASCII character numbers.

Then, the glyph you get for \N'45' in -Tps TR is *not* similar to
the wide glyph that you would expect for U+002D HYPHEN-MINUS.
Instead, it is a typical, short hyphen.

So, which glyph in -Tps TR does represents U+002D HYPHEN-MINUS?
Asking the question that way, the full dilemma becomes obvious:
Just like in classical typography, there is *none*.

So not only does groff provide no way to request an output glyph
for "ASCII -", worse, the fonts for the important -Tps and -Tpdf
devices do not even contain such a glyph!

Consider a program that wants to copy text out of a groff-generated
PostScript or PDF document and paste it into a Unicode terminal
window.  Such a program could reasonably be expected to translate
-Tps TR codepoint 45 into U+2010 HYPHEN because that's what you get
from the unambiguous \(hy input character, and it could reasonably
be expected to translate -Tps TR codepoint 173 into U+2212 MINUS
SIGN because, as Doug kindly reminded us, \- is the minus sign in
the current font and \(mi is not in the TR font at all.  But now
we have already exhausted the glyphs in -Tps TR and there is no
glyph left that could be converted into U+002D HYPHEN-MINUS.

Many people have said during this discussion that they wouldn't
expect copy and paste from a PDF viewer to a UTF-8 terminal to work.
The above may well be part of the precise reason why indeed it
cannot fully work.

On the other hand, i stumbled because nowadays, we have got used
to the feeling that Unicode might be a superset of everything.  So
i considered \- and \(mi redundant because both map to U+2212, and
so i hoped one of them might be up for grabs.  Yet Doug kindly
reminded us what the distinction is, and that distinction indeed
cannot be represented in terms of Unicode codepoints.

So i have to reluctantly conclude that your original problem, Ralph,
of requesting "a glyph representing U+002D HYPHEN-MINUS" is unsolvable.
At least not without adding yet another glyph to the -Tps TR font.
But even that would hardly help, for two reasons: all PostScript
and PDF viewer software would first have to catch up, correctly
recognize the glyph, and correctly translate it to U+002D HYPHEN-MINUS
- but the ecosystem such software lives in evolves incredibly slowly
nowadays.  And then you would have to define a new character escape
sequence to access the new glyph, since all the existing escapes
already mean something else.  But that's effectively a reductio ad
absurdum: telling all manual page authors to, henceforth, write "wc
\(hml" is not going to fly.  Very few would understand and follow
that, and even i would tend to resist it as excessive complication.


So i fear we are left with the traditional workaround:

Re: [Groff] Critique this bold-italic private macro for man pages

2017-05-03 Thread Ingo Schwarze
Hi Branden,

G. Branden Robinson wrote on Wed, May 03, 2017 at 08:52:42PM -0400:

> Why do my man pages need to be more portable the shell scripts
> or C code I ship with them?

They need not, but i would consider aiming for about the same level
of portability reasonable.  Meaning, that they work on systems that
you target.

> What is the value in reading the _formatted_ version of a man page
> for a tool that won't compile or run correctly on the host?

That's indeed not very high priority, but why do you think that
any (reasonably portable) software would not run on Solaris 11?

I do not want to promote Oracle software, and in fact i use none
for my own purposes.  But Solaris 11 is a certified UNIX system
conforming to the 2016 edition of the Single UNIX Specification,
version 4:

  https://www.opengroup.org/openbrand/register/brand3585.htm

That implies that the native compiler conforms to ISO C 99
and that the native /bin/sh is a POSIX shell.

> I refuse to write shell scripts for general-purpose consumption
> only in the historical Bourne dialect that Solaris /bin/sh was,

That doesn't apply to Solaris 11.

> and I refuse to write man pages for general-purpose consumption
> only in some minimal common subset that _no one_ has troubled
> themselves to carefully define.

Admittedly, C and sh(1) are well standardized, and man(7) is not.
All the same, would you want your manual pages to break on systems
where your program compiles and runs just fine?

Yes, Solaris typically does need some porting work for code that
is mostly developed on BSD or Linux, even though it is a POSIX
system.  But porting is usually not a big deal.  At least not for
Solaris 11.  Solaris 9, on the other hand, is a bit more of an
adventure, but i didn't talk about the nroff you might find there,
either.

[...]
> For example, even if that old Solaris system

I often snarkily call any version of Solaris "archaic software"
myself.  All the same, this particular system is a very new system
on sale today.

> can render the output of docbook-to-man legibly,

It absolutely cannot.  On that system, even though it is the newest
that you can buy from Oracle today, any kind of docbook-to-man
output will fall flat on its face before it even finds the time
to be surprised enough to utter a grunt.

[...]
> Finally, Groff's own portability is sufficiently broad that we should be
> able to say, "as a rule, if you want man pages written in the past 25
> years to render nicely on Boozix 11.0, Gizmoware's custom hybrid of
> 4.3BSD and SVr3, please build and install Groff so that its macro
> packages can achieve this for you."

As a matter of fact, groff is already installed on that system:

 $ groff -v | head -n1
GNU groff version 1.19.2

It is a very new Solaris system and not exactly the newest groff
available.  So, you want to use groff?  Fine, read the manual!

 $ man groff 2>&1 | cat
Reformatting page.  Please Wait... done
 $ echo $?
0

Hum.  It looks like that manual page uses features that the native
man(1) cannot handle, likely because it uses the native nroff(1).

So, to read the groff manual pages, you already need to know how to
use groff:

 $ gnroff -c -mandoc /usr/share/man/man1/groff.1 | less

Voila.  Now it works.


I admit such a system is hard to use, and i wouldn't use it for
productive work myself.  But completely dismissing it when
talking about "portable software" is not gonna cut it.

Oh, by the way, if you don't like the native shell:

 $ bash --version | head -n2
GNU bash, version 4.1.17(1)-release (sparc-sun-solaris2.11)
Copyright (C) 2009 Free Software Foundation, Inc.

Same pattern as with groff: Not at all up to date, but *something*
is there that is maybe almost acceptable for some purposes.

Yours,
  Ingo



Re: [Groff] ASCII Minus Sign in man Pages

2017-05-03 Thread Ingo Schwarze
Hi James,

James K. Lowden wrote on Wed, May 03, 2017 at 08:13:18PM -0400:

> IIUC, this debate about how to render - and \- stems from a conflict in
> historical practice.  Is the following correct?  
> 
>   When troff was young, terminals were ascii and the - character
> was 0x2d.  Manpage guidelines encouraged the use of \- for flags because
> they rendered nicely in printed documents with no harm done to nroff
> output.  They did that despite the obvious fact that the manpage is
> there to describe what to type, and basically no one can type the
> denoted character.  
> 
>   Then Unicode pronounced that 0x2d was neither fish nor fowl,
> and gave us hyphen, minus, and endash characters.  groff dutifully
> mapped - onto hyphen \- onto minus.  But when terminals gained Unicode
> capability, some of them lost cut-and-paste convenience.

So far, that is maybe somewhat simplified, but more or less to the
point.  For details of early runoff/roff history, see

  http://manpages.bsd.lv/history.html

Basically, you are starting your narrative in 1973.  At that point,
the language was about nine years old and had seen at least ten
earlier implementations in about eight different programming languages
on about five different operating systems on about eight different
machines by at least ten different authors.  Finding out how all
those handled "-" when they were young might be non-trivial.

> The debate is over how to recover that convenience.  

No, it is not.  That was solved long ago, at the latest here:

  commit 98acc924f4e32cfc2209df5db0c21921df8cc7ac
  Author: Werner LEMBERG 
  Date:   Fri Jan 2 23:16:20 2009 +

* tmac/an-old.tmac, tmac/doc.tmac: For -Tutf8, map \-, -, ', and `
conservatively to ASCII for the sake of easy cut and paste.

The debate is over three different topics:

 1. Cut and paste from -Tps, -Tpdf, and -Thtml.

 2. What to use if ASCII HYPHEN-MINUS is desired in the output,
both in manual pages and in other documents.

 3. What to use if a mathematical minus sign is desired in the
output.

> Oddly, my system doesn't exibit any cut-and-paste anomaly despite
> using xterm with the "-en UTF-8" option.  Searching for - in less
> also works.

Yes, due to Werner's change in 2009 quoted above.
One of the effects is that in manual pages, "-" and "\-"
in the input always render as U+002D HYPHEN-MINUS in -Tutf8.

> If it's a UI issue we're confronting, perhaps it's really up to the UI
> to deal with.  The man utility can certainly impose on nroff the
> requirement that - and \- both render as 0x2d.  Then it shows up
> correctly in the pager.  It is visually acceptable to the user, and
> DTRT regarding the UI.  (Maybe that's what Ubuntu LTS does for me; I
> don't know.) 

It's not Ubuntu, it's groff itself already doing that for you.

> It's not obvious to me groff should make any change at all.  At most,
> reverting the mapping of - so that it outputs 0x2d again would undo a
> nonobvious, subtle change in favor of simplicity.  

Probably not, because that would break each and every existing
non-manpage roff document.  Besides, i just noticed that it's
completely unclear what "output U+002D HYPHEN-MINUS in a PostScript
or PDF document" is even supposed to mean, see my other mail...

Yours,
  Ingo