[PATCH] new built-in string-valued register: .trap

2023-07-26 Thread G. Branden Robinson
Background:

https://savannah.gnu.org/bugs/index.php?64212

Now's a good time to tell me if this is a bad idea.  :P

The idea is to make it easier to debug issues like the following.

https://savannah.gnu.org/bugs/index.php?56499
https://savannah.gnu.org/bugs/index.php?58447

Here's the patch.

commit 0ccb60f70e3f6bf2355663a389f8e83af53c8c52
Author: G. Branden Robinson 
Date:   Wed Jul 26 02:58:27 2023 -0500

[troff]: Add new `.trap` built-in register.

* src/roff/troff/div.h (class diversion, class macro_diversion, class
  top_level_diversion): Declare new member function
  `get_next_trap_name`.

* src/roff/troff/div.cpp (macro_diversion::get_next_trap_name): New
  member function returns name of diversion trap if any, and if its
  position is greater than the vertical drawing position, otherwise an
  empty string.

  (top_level_diversion::get_next_trap_name): New member function returns
  the name of the next vertical position trap, if any, otherwise an
  empty string.

  (class next_trap_name_reg): New class has one member, a `get_string()`
  function.

  (next_trap_name_reg::get_string): New function.

  (init_div_requests): Add `.trap` to the register dictionary and wire
  it up to `next_trap_name_reg`.

* doc/groff.texi (Page Localization Traps):
* man/groff.7.man (Read-only registers):
* man/groff_diff.7.man (New registers): Document it.

* src/roff/groff/tests/dot-trap_register_works.sh: Test it.
* src/roff/groff/groff.am (groff_TESTS): Run test.

* NEWS: Add item.

Fixes .  Thanks to Dave Kemper for
feedback.

diff --git a/ChangeLog b/ChangeLog
index a184500e1..d5311f09c 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,35 @@
+2023-07-26  G. Branden Robinson 
+
+   [troff]: Add new `.trap` built-in register.
+
+   * src/roff/troff/div.h (class diversion, class macro_diversion,
+   class top_level_diversion): Declare new member function
+   `get_next_trap_name`.
+   * src/roff/troff/div.cpp (macro_diversion::get_next_trap_name):
+   New member function returns name of diversion trap if any, and
+   if its position is greater than the vertical drawing position,
+   otherwise an empty string.
+   (top_level_diversion::get_next_trap_name): New member function
+   returns the name of the next vertical position trap, if any,
+   otherwise an empty string.
+   (class next_trap_name_reg): New class has one member, a
+   `get_string()` function.
+   (next_trap_name_reg::get_string): New function.
+   (init_div_requests): Add `.trap` to the register dictionary and
+   wire it up to `next_trap_name_reg`.
+
+   * doc/groff.texi (Page Localization Traps):
+   * man/groff.7.man (Read-only registers):
+   * man/groff_diff.7.man (New registers): Document it.
+
+   * src/roff/groff/tests/dot-trap_register_works.sh: Test it.
+   * src/roff/groff/groff.am (groff_TESTS): Run test.
+
+   * NEWS: Add item.
+
+   Fixes .  Thanks to Dave
+   Kemper for feedback.
+
 2023-07-21  G. Branden Robinson 
 
* src/roff/troff/node.cpp (unbreakable_space_node::tprint):
diff --git a/NEWS b/NEWS
index 92a895cf2..0bd1348f5 100644
--- a/NEWS
+++ b/NEWS
@@ -23,6 +23,9 @@ o GNU troff output now reports unbreakable spaces (those 
produced with
   the `\~` escape sequence) as word breaks with the documentary 'w'
   command, just as regular breakable spaces do.
 
+o A new read-only, string-valued register, `.trap`, interpolates the
+  name of the next vertical position trap that will be sprung.
+
 eqn
 ---
 
diff --git a/doc/groff.texi b/doc/groff.texi
index e8e62c05c..a5d66e112 100644
--- a/doc/groff.texi
+++ b/doc/groff.texi
@@ -14693,6 +14693,13 @@ @node Page Location Traps
 sense to interpolate it outside of macros called by traps.
 @endDefreg
 
+@Defreg {.trap}
+@cindex next trap name register (@code{.trap})
+@cindex trap name, next, register (@code{.trap})
+This read-only, string-valued register interpolates the name of the next
+vertical position trap that will be sprung.
+@endDefreg
+
 @Defreg {.pe}
 @cindex @code{bp} request, and traps (@code{.pe})
 @cindex traps, sprung by @code{bp} request (@code{.pe})
diff --git a/man/groff.7.man b/man/groff.7.man
index 91809aefb..ce1ae7779 100644
--- a/man/groff.7.man
+++ b/man/groff.7.man
@@ -6265,6 +6265,16 @@ .SS "Read-only registers"
 (string-valued).
 .
 .TP
+.REG .trap
+Name of the next vertical position trap that will be sprung
+(string-valued);
+see
+.request .wh ,
+.request .ch ,
+and
+.request .dt .
+.
+.TP
 .REG .trunc
 Amount of vertical space truncated by the most recently sprung
 vertical position trap,
diff --git a/man/groff_diff.7.man b/man/groff_diff.7.man
index 90901ea56..da35a3b78 100644
--- a/man/groff_diff.7.man
+++ b/man/groff_diff.7.man
@@ -4469,6 +4469,14 @@ .S

Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character

2023-07-26 Thread G. Branden Robinson
Hi Thomas,

At 2023-07-26T10:47:05+0200, Thomas ten Cate wrote:
> In the bash manual page (`man bash`), the ASCII tilde character '~'
> (0x7e) is replaced by the Unicode character '˜' (U+02DC SMALL TILDE):
> 
> $ man bash | grep 'additional binary operator'
>   An additional binary operator, =˜, is available,
> 
> The same happens for the use of ~ as a shorthand for the home
> directory. This makes the manual page incorrect, and difficult to
> search.
> 
> It looks like there is an ASCII tilde character in the man page's
> source code:
> 
> $ gunzip -c /usr/share/man/man1/bash.1.gz | grep 'additional
> binary operator'
> An additional binary operator, \fB=~\fP, is available, with the same
> 
> I don't know the first thing about groff, but `man groff_char`
> suggests that ~ is indeed rendered as "modifier tilde", and that one
> should write \(ti to obtain an actual tilde character.

I know a little about groff.  Your advice is fine for man pages that
target only groff[1] and/or mandoc[2], but not Heirloom Doctools
troff,[3] neatroff[4] or Plan 9 troff (in its original form or as
maintained in Plan 9 from User Space[5]), and not legacy implementations
descended from AT&T troff that are, as far as I can tell, unmaintained
by the few Unix System V vendors that still exist.[6][7]

Many projects don't need to worry about such extreme portability in
their man pages, but GNU Bash arguably does.  (I'm open to correction.)

Furthermore, in the *roff language itself, as originally implemented by
Joe Ossanna (and re-implemented by Brian Kernighan) there is no good
way to test for the existence of a special character.[8]

As a first stab at it, I'd divide the world into two camps: (a) groff
and mandoc(1), and (b) everything else, and not worry about (b).

The bash(1) man page has an extensive preamble already that still
includes a workaround for 4.3BSD(!), so adding a little bit to it to
accommodate systems developed since 1990 might not be too disruptive.

I'm attaching a straw man diff to the bash(1) page.  If Chet likes it,
I'm happy to prepare one against the bash devel branch.

bash(1) also attempts to select a font named "CW" in places, which is
another portability problem (it's a Unix System III [and later] troff
font name that was available on _some_ output devices).  But I'd like to
see how we get over this bridge before I try to cross that one.  :)

> I'm guessing the manpage is generated from texinfo, so if this is
> actually a bug in texinfo, feel free to forward this email to
> bug-texinfo at gnu.org.

I don't think that's actually true.  As far as I know, Chet maintains
Bash's Texinfo docs and man pages in parallel by hand.

Regards,
Branden

[1] https://www.gnu.org/software/groff/
[2] https://mandoc.bsd.lv/
[3] https://github.com/n-t-roff/heirloom-doctools
[4] https://github.com/aligrudi/neatroff
[5] https://github.com/9fans/plan9port

[6] HP-UX 11 appears to still ship an AT&T/DWB or System V troff.
Solaris 10 does, but it is nearing end-of-life and Solaris 11
replaced its troff (of similar lineage as HP-UX's) with groff.

[7] It is also not hard to make AT&T-descended troffs support the
`ha` and `ti` special characters.  For instance, here's a patch to
Documenter's Workbench (DWB) 3.3 troff's "Latin1" output device.

--- R.orig  2023-07-26 09:55:30.527340674 -0500
+++ R   2023-07-26 09:58:49.658662373 -0500
@@ -68,6 +68,7 @@
 bs "
 ]  33  3   93
 ^  33  2   147
+ha "
 ---47  2   94
 ---50  1   95
 `  33  2   96
@@ -101,6 +102,7 @@
 ---20  2   124
 }  48  3   125
 ~  33  2   148
+ti "
 ---54  0   126
 \` 33  2   145
 ga "

But even after 30+ years since groff emerged on the scene, I'm not
aware of a single such troff having done this.

[8] A clever *roff hacker could try using the output comparison operator
and width computation escape sequence to measure of a candidate
special character, but this would not be reliable.  The output
drivers of AT&T device-independent troff appear to format
unrecognized characters as blanks (putting horizontal motions on the
output).  (groff does not, throwing an error diagnostic instead.)[9]
But if a special character did exist and happened to be the same
width as such a blank character, this test would produce a false
negative.  Worse, on nroff-mode devices, including the terminal
emulators that 99% of all man page reading is done, _all_ glyphs are
the same width, so you'd get false negatives all the time.

[9] This is a groff/AT&T troff difference that I don't think is
documented by groff.  Maybe I should fix that.
--- bash.1.orig	2023-07-26 10:19:18.770924818 -0500
+++ bash.1	2023-07-26 10:22:48.554457262 -0500
@@ -26,6 +26,22 @@
 .if !rzY .nr zY 0 \" avoid a warning about an undefined register
 .if \n(zZ=1 .ig zZ
 .if \n(zY=1 .ig zY
+.