Re: groff features for hyperlinked man pages (was: No 6.05/.01 pdf book available)

2023-08-18 Thread Alejandro Colomar
Hi Branden, Deri,

On 2023-08-15 02:50, G. Branden Robinson wrote:
[...]

> 
>>> I just re-read this, and am confused.  '\-' is an ASCII character,
>>> isn't it?  In fact, all of the Linux man-pages pathnames are
>>> composed exclusively of ASCII characters, aren't they?
> 
> You're thinking about this at the wrong level, Alex.  `\-` is a *roff
> special character.  Unless converted to something else by character
> translation or character definition,[7] it goes to the
> device-independent page description language as a special character too.

[...]

> It is up to the output device to decide what to do with that.  groff's
> "ascii" and "latin1" output devices put out a U+002D character; its
> "utf8" device puts out a minus sign, U+2212.  Now, before anyone
> defecates a brick about the U+2212 not being easily greppable, nor
> useful for copying and pasting to a shell prompt, the man(7) and mdoc(7)
> macro packages override that.

So, \- is kept as a special character, even in man(7), until output
drivers translate it to ASCII -?  Or which program does the translation?
If it's gropdf(1) that makes the translation, I guess it will also be
able to perform the same translation for MR.  If the translation has
already been made by troff(1), then gropdf(1) shouldn't care.  In any
case, I still don't see the problem.

[...]

>> .BR persistent\-keyring (7) ,
> [...]
>> Which when converted to .MR calls looks like:-
> [...]
>> .MR "persistent\-keyring" "7" "," "persistent-keyring"
> 
> Urp.  No, it doesn't.  Not unless you changed `MR` in deri-gropdf-ng.
> 
> .BR persistent\-keyring (7) ,
> 
> when converted to an `MR` call, looks like this.
> 
> .MR persistent\-keyring 7 ,
> 
> I expect man page authors would violently protest if they were told they
> had to type all those quotes and, worse, repeat the name of the page.

Indeed.  I won't violently protest to Deri's experiments, as I do worse
aberrations while experimenting, but I would if this went into groff(1).  :)

> 
> One of the selling points of `MR` is less typing (no parentheses).

I woudn't really buy it just for that.  ;)
In fact, not having a RM variant, it's more typing when (foo(1)).
But yep, not being DRY would be a trigger for burning the streets of Paris.

>  It
> is hard enough to sell that macro on the linux-man list without
> inaccurate claims entering the fray.
> 
> Now, if I understand correctly, is quite possible that something you're
> doing in your branch is having `MR` call another macro internally to
> prepare a hyperlink with some "anchor"--I won't say "node" because
> collides with GNU troff internal jargon--information.  (This is
> suggested by the heavy quoting you showed, since when macros call each
> other with arbitrary numbers of arguments, and those arguments need to
> be kept separate in the callee, the caller should use the `\$@` escape
> sequence, which is analogous to the POSIX shell's `$@`.)

I'd expect that the hyperlinking ability should be modifyable with
groff(1) --I don't care at what level of the pipeline--, similar to how
it was modifiable with man2html(1).  But the source code shouldn't know
about it.

[...]

> I don't think it's naughty; I think that by and large, man page authors
> don't care to give "anchor names" to elements of their document.  They
> want the macro package to figure it out.

Indeed.  I'd like groff(1) to figure out some name that resembles the
text used as man page reference, or as section heading; I don't want to
specify it.

>  I think one reason--maybe the
> only reason--people are getting a glimpse inside the sausage factory of
> GNU troff internals is because we haven't had a defined mechanism for
> getting character data to an output device that is neither (1) intended
> for formatting (writing visible glyphs) nor (2) in the printable ASCII
> (Unicode Basic Latin) character set.  That's the aforementioned Savannah
> #63074.[3]
> 
> Looking farther ahead, I think a further step is required if we're going
> to have intra-page links; we're going to have to have a way to
> disambiguate duplicates.  In practice there's not much risk from having
> duplicate section titles in man pages, but I reckon a big, complex page
> could duplicate subsection titles.  And if we automatically generate
> hyperlink tags for paragraph tags, those would likely need it as well.
> Maybe representing such internal anchors hierarchically will be enough:
> "section_subsection_tag" or something like that.

Yep.  I'd expect something like that.  You could also include the page
name in a book, which would involve the changes suggested by Deri of not
having the page title hardcoded as the first level, right?


Cheers,
Alex

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


Re: Proposed: change `pm` request argument semantics (was: process man(7) (or any other package of macros) without typesetting)

2023-08-18 Thread Alejandro Colomar
Hi Branden, Lennart, Doug,

On 2023-08-18 01:44, G. Branden Robinson wrote:

> In other words, you want to see what a *roff document looks like after
> all macro expansions have been (recursively) performed.

Exactly.  Basically, I want an equivalent of cpp(1) for expanding
man(7) macros.

> 
> I wanted this, too, back in 2017 when I first started working on groff.
> 
> The short answer is "no".
> 
> The longer answer is that this is hard because GNU troff, like AT&T
> troff, never builds a complete syntax tree for the document the way
> "modern" document formatters do.

This gives me some hope.  If it's just that both AT&T troff and GNU
groff have been designed so that they do two things, but can't do one
thing and do it well, then my solution involves writing a manpp(1)
from scratch.  If you tell me that's possible, and possibly the easiest
way, then I may do it some day.

Doug, I'm curious about why the original design of man(7) and
troff(1)/nroff(1) didn't separate this into a macro preprocessor.
Do you remember some details about that?  Was it impossible, or maybe
too much work?

[...]

> 
> I'll say it before Ingo does: mandoc(1) (as I understand it) _does_
> build a syntax tree for the entire document before producing output,
> which enables some of the nice features that it has.
> 
> I see Lennart has replied with some further exploration of the
> challenges here.  Rather than duplicate his comments, let me move on to
> something vaguely related but, I hope, potentially useful.


Cheers,
Alex

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


Re: groff features for hyperlinked man pages (was: No 6.05/.01 pdf book available)

2023-08-18 Thread G. Branden Robinson
Hi Alex,

At 2023-08-18T15:50:21+0200, Alejandro Colomar wrote:
> On 2023-08-15 02:50, G. Branden Robinson wrote:
> >>> I just re-read this, and am confused.  '\-' is an ASCII character,
> >>> isn't it?  In fact, all of the Linux man-pages pathnames are
> >>> composed exclusively of ASCII characters, aren't they?
[...]
> > You're thinking about this at the wrong level, Alex.  `\-` is a
> > *roff special character.  Unless converted to something else by
> > character translation or character definition,[7] it goes to the
> > device-independent page description language as a special character
> > too.
[...]
> > It is up to the output device to decide what to do with that.
> > groff's "ascii" and "latin1" output devices put out a U+002D
> > character; its "utf8" device puts out a minus sign, U+2212.  Now,
> > before anyone defecates a brick about the U+2212 not being easily
> > greppable, nor useful for copying and pasting to a shell prompt, the
> > man(7) and mdoc(7) macro packages override that.
> 
> So, \- is kept as a special character, even in man(7), until output
> drivers translate it to ASCII -?

In *roff, any character, ordinary or special, can be "translated" to any
other with the `tr` request.

.tr AB \" translate "A" to "B"
.tr -\- \" translate ordinary char '-' to special char '-'
.tr \[aq]' \" translate special char 'aq' to ordinary char "'"

The resemblance to Unix tr(1) is not coincidental.

In GNU troff, context-dependent translations are available (for fairly
specialized purposes--`trin` and `trnt`).  Beneath that,[1] you can
_redefine_ any ordinary or special character.

The formatter applies character translations (and, in GNU troff,
definitions) before producing output.

> Or which program does the translation?

Output devices can perform translations too.  In the above example, "'"
doesn't "remain" "'"; if the output device has directional single
quotes, groff's font descriptions will assign it to the glyph for U+2019
or similar.

Some time perusing the 1.23.0 groff_char(7) and groff_font(5) man pages
will be rewarded.  I hope one day soon to revise groff_out(5) and the
"Using Symbols" section of groff's Texinfo manual to my
satisfaction--the latter will likely drive updates to groff(7)--and by
then the path from input characters to visible output glyphs should be
completely illuminated.  If you were to call this stuff frustratingly
complex, I'd agree.  Most of the complexity exists for good reasons,
though some of those are historical.  The responsible update of
technical documentation entails unearthing and presenting those reasons.

> If it's gropdf(1) that makes the translation, I guess it will also be
> able to perform the same translation for MR.

The translation of `\-` to `-` specifically for the purpose of writing
PDF metadata (bookmarks) via troff device control commands is _extremely
specialized_.  No man page author should ever have to deal with it.

> If the translation has already been made by troff(1), then gropdf(1)
> shouldn't care.  In any case, I still don't see the problem.

If Deri and I do our jobs right, you won't need to care, nor see any
problems.  We're workin' on it.  (Mostly Deri has been, to date.  My
"contribution" has mainly been to look at an.tmac on one hand and the
"pdfhref" macro on the other and stare slack-jawed, wondering how the
hell I'll ever get the impedances to match.  Don't be surprised if
something gets refactored.)

> Indeed.  I won't violently protest to Deri's experiments, as I do
> worse aberrations while experimenting, but I would if this went into
> groff(1).  :)

I don't know what the current state of play with respect to a
four-argument `MR` in Deri's branch is.  I'll let him speak to it.  I'd
prefer not to undertake a code review half-cocked (and ill-prepared,
besides).

> > One of the selling points of `MR` is less typing (no parentheses).
> 
> I woudn't really buy it just for that.  ;)

No indeed, which is why it has much bigger reasons to recommend it,
namely those cited in groff's NEWS file.

  Inclusion of the `MR` macro was prompted by its introduction to
  Plan 9 from User Space's troff in August 2020.  Its purpose is to
  ameliorate several long-standing problems with man page cross
  references: (1) the package's lack of inherent hyperlink support for
  them; (2) false-positive identification of strings resembling man page
  cross references (as can happen with "exit(1)", "while(1)",
  "sleep(5)", "time(0)" and others) by terminal emulators and other
  programs; (3) the unwanted intrusion of hyphens into man page topics,
  which frustrates copy-and-paste operations (this problem has always
  been avoidable through use of the \% escape sequence, but cross
  references are frequent in man pages and some page authors are
  inexpert *roff users); and (4) deep divisions in man page maintenance
  communities over which typeface should be used to set the man page
  topic (italics, roman, or bold).

> In fact, not having a RM variant