Hi Thomas,

At 2023-07-26T10:47:05+0200, Thomas ten Cate wrote:
> In the bash manual page (`man bash`), the ASCII tilde character '~'
> (0x7e) is replaced by the Unicode character '˜' (U+02DC SMALL TILDE):
> 
>     $ man bash | grep 'additional binary operator'
>                   An additional binary operator, =˜, is available,
> 
> The same happens for the use of ~ as a shorthand for the home
> directory. This makes the manual page incorrect, and difficult to
> search.
> 
> It looks like there is an ASCII tilde character in the man page's
> source code:
> 
>     $ gunzip -c /usr/share/man/man1/bash.1.gz | grep 'additional
> binary operator'
>     An additional binary operator, \fB=~\fP, is available, with the same
> 
> I don't know the first thing about groff, but `man groff_char`
> suggests that ~ is indeed rendered as "modifier tilde", and that one
> should write \(ti to obtain an actual tilde character.

I know a little about groff.  Your advice is fine for man pages that
target only groff[1] and/or mandoc[2], but not Heirloom Doctools
troff,[3] neatroff[4] or Plan 9 troff (in its original form or as
maintained in Plan 9 from User Space[5]), and not legacy implementations
descended from AT&T troff that are, as far as I can tell, unmaintained
by the few Unix System V vendors that still exist.[6][7]

Many projects don't need to worry about such extreme portability in
their man pages, but GNU Bash arguably does.  (I'm open to correction.)

Furthermore, in the *roff language itself, as originally implemented by
Joe Ossanna (and re-implemented by Brian Kernighan) there is no good
way to test for the existence of a special character.[8]

As a first stab at it, I'd divide the world into two camps: (a) groff
and mandoc(1), and (b) everything else, and not worry about (b).

The bash(1) man page has an extensive preamble already that still
includes a workaround for 4.3BSD(!), so adding a little bit to it to
accommodate systems developed since 1990 might not be too disruptive.

I'm attaching a straw man diff to the bash(1) page.  If Chet likes it,
I'm happy to prepare one against the bash devel branch.

bash(1) also attempts to select a font named "CW" in places, which is
another portability problem (it's a Unix System III [and later] troff
font name that was available on _some_ output devices).  But I'd like to
see how we get over this bridge before I try to cross that one.  :)

> I'm guessing the manpage is generated from texinfo, so if this is
> actually a bug in texinfo, feel free to forward this email to
> bug-texinfo at gnu.org.

I don't think that's actually true.  As far as I know, Chet maintains
Bash's Texinfo docs and man pages in parallel by hand.

Regards,
Branden

[1] https://www.gnu.org/software/groff/
[2] https://mandoc.bsd.lv/
[3] https://github.com/n-t-roff/heirloom-doctools
[4] https://github.com/aligrudi/neatroff
[5] https://github.com/9fans/plan9port

[6] HP-UX 11 appears to still ship an AT&T/DWB or System V troff.
    Solaris 10 does, but it is nearing end-of-life and Solaris 11
    replaced its troff (of similar lineage as HP-UX's) with groff.

[7] It is also not hard to make AT&T-descended troffs support the
    `ha` and `ti` special characters.  For instance, here's a patch to
    Documenter's Workbench (DWB) 3.3 troff's "Latin1" output device.

--- R.orig      2023-07-26 09:55:30.527340674 -0500
+++ R   2023-07-26 09:58:49.658662373 -0500
@@ -68,6 +68,7 @@
 bs     "
 ]      33      3       93
 ^      33      2       147
+ha     "
 ---    47      2       94
 ---    50      1       95
 `      33      2       96
@@ -101,6 +102,7 @@
 ---    20      2       124
 }      48      3       125
 ~      33      2       148
+ti     "
 ---    54      0       126
 \`     33      2       145
 ga     "

    But even after 30+ years since groff emerged on the scene, I'm not
    aware of a single such troff having done this.

[8] A clever *roff hacker could try using the output comparison operator
    and width computation escape sequence to measure of a candidate
    special character, but this would not be reliable.  The output
    drivers of AT&T device-independent troff appear to format
    unrecognized characters as blanks (putting horizontal motions on the
    output).  (groff does not, throwing an error diagnostic instead.)[9]
    But if a special character did exist and happened to be the same
    width as such a blank character, this test would produce a false
    negative.  Worse, on nroff-mode devices, including the terminal
    emulators that 99% of all man page reading is done, _all_ glyphs are
    the same width, so you'd get false negatives all the time.

[9] This is a groff/AT&T troff difference that I don't think is
    documented by groff.  Maybe I should fix that.
--- bash.1.orig	2023-07-26 10:19:18.770924818 -0500
+++ bash.1	2023-07-26 10:22:48.554457262 -0500
@@ -26,6 +26,22 @@
 .if !rzY .nr zY 0 \" avoid a warning about an undefined register
 .if \n(zZ=1 .ig zZ
 .if \n(zY=1 .ig zY
+.
+.\" Use \(ha and \(ti special characters where available, for better
+.\" typography and pattern matching on devices with large glyph
+.\" repertoires, like Unicode terminals and PDF.
+.\"
+.\" GNU troff (and implementations claiming compatibility with it
+.\" predefine a register `.g`); mandoc(1)'s `.f` register (mounting
+.\" position of the selected font) is always zero, whereas no troff ever
+.\" starts `.f` out with that value.
+.ds ha ^
+.ds ti ~
+.if \n(.g:(\(.f=0) \{\
+.  ds ha \(ha
+.  ds ti \(ti
+.\}
+.
 .TH BASH 1 "2020 October 29" "GNU Bash 5.1"
 .\"
 .\" There's some problem with having a `@'
@@ -205,7 +221,7 @@
 instead of the system wide initialization file
 .I /etc/bash.bashrc
 and the standard personal initialization file
-.I ~/.bashrc
+.I \*(ti/.bashrc
 if the shell is interactive (see
 .SM
 .B INVOCATION
@@ -223,10 +239,10 @@
 Do not read either the system-wide startup file
 .FN /etc/profile
 or any of the personal initialization files
-.IR ~/.bash_profile ,
-.IR ~/.bash_login ,
+.IR \*(ti/.bash_profile ,
+.IR \*(ti/.bash_login ,
 or
-.IR ~/.profile .
+.IR \*(ti/.profile .
 By default,
 .B bash
 reads these files when it is invoked as a login shell (see
@@ -238,7 +254,7 @@
 Do not read and execute the system wide initialization file
 .I /etc/bash.bashrc
 and the personal initialization file
-.I ~/.bashrc
+.I \*(ti/.bashrc
 if the shell is interactive.
 This option is on by default if the shell is invoked as
 .BR sh .
@@ -337,8 +353,8 @@
 with the \fB\-\-login\fP option, it first reads and
 executes commands from the file \fI/etc/profile\fP, if that
 file exists.
-After reading that file, it looks for \fI~/.bash_profile\fP,
-\fI~/.bash_login\fP, and \fI~/.profile\fP, in that order, and reads
+After reading that file, it looks for \fI\*(ti/.bash_profile\fP,
+\fI\*(ti/.bash_login\fP, and \fI\*(ti/.profile\fP, in that order, and reads
 and executes commands from the first one that exists and is readable.
 The
 .B \-\-noprofile
@@ -347,12 +363,12 @@
 When an interactive login shell exits,
 or a non-interactive login shell executes the \fBexit\fP builtin command,
 .B bash
-reads and executes commands from the file \fI~/.bash_logout\fP, if it
+reads and executes commands from the file \fI\*(ti/.bash_logout\fP, if it
 exists.
 .PP
 When an interactive shell that is not a login shell is started,
 .B bash
-reads and executes commands from \fI/etc/bash.bashrc\fP and \fI~/.bashrc\fP,
+reads and executes commands from \fI/etc/bash.bashrc\fP and \fI\*(ti/.bashrc\fP,
 if these files exist.
 This may be inhibited by using the
 .B \-\-norc
@@ -360,7 +376,7 @@
 The \fB\-\-rcfile\fP \fIfile\fP option will force
 .B bash
 to read and execute commands from \fIfile\fP instead of
-\fI/etc/bash.bashrc\fP and \fI~/.bashrc\fP.
+\fI/etc/bash.bashrc\fP and \fI\*(ti/.bashrc\fP.
 .PP
 When
 .B bash
@@ -396,7 +412,7 @@
 read and execute commands from
 .I /etc/profile
 and
-.IR ~/.profile ,
+.IR \*(ti/.profile ,
 in that order.
 The
 .B \-\-noprofile
@@ -446,7 +462,7 @@
 If
 .B bash
 determines it is being run in this fashion, it reads and executes
-commands from \fI~/.bashrc\fP and \fI~/.bashrc\fP, if these files
+commands from \fI\*(ti/.bashrc\fP and \fI\*(ti/.bashrc\fP, if these files
 exist and are readable.
 It will not do this if invoked as \fBsh\fP.
 The
@@ -761,7 +777,7 @@
 to be matched as a string.
 .if t .sp 0.5
 .if n .sp 1
-An additional binary operator, \fB=~\fP, is available, with the same
+An additional binary operator, \fB=\*(ti\fP, is available, with the same
 precedence as \fB==\fP and \fB!=\fP.
 When it is used, the string to the right of the operator is considered
 a POSIX extended regular expression and matched accordingly
@@ -784,7 +800,7 @@
 .if t .sp 0.5
 .if n .sp 1
 The pattern will match if it matches any part of the string.
-Anchor the pattern using the \fB^\fP and \fB$\fP regular expression
+Anchor the pattern using the \fB\*(ha\fP and \fB$\fP regular expression
 operators to force it to match the entire string.
 The array variable
 .SM
@@ -1629,7 +1645,7 @@
 command.
 .TP
 .B BASH_REMATCH
-An array variable whose members are assigned by the \fB=~\fP binary
+An array variable whose members are assigned by the \fB=\*(ti\fP binary
 operator to the \fB[[\fP conditional command.
 The element with index 0 is the portion of the string
 matching the entire regular expression.
@@ -2087,7 +2103,7 @@
 If this parameter is set when \fBbash\fP is executing a shell script,
 its value is interpreted as a filename containing commands to
 initialize the shell, as in
-.IR ~/.bashrc .
+.IR \*(ti/.bashrc .
 The value of
 .SM
 .B BASH_ENV
@@ -2128,8 +2144,8 @@
 .B cd
 command.
 A sample value is
-.if t \f(CW".:~:/usr"\fP.
-.if n ".:~:/usr".
+.if t \f(CW".:\*(ti:/usr"\fP.
+.if n ".:\*(ti:/usr".
 .TP
 .B CHILD_MAX
 Set the number of exited child status values for the shell to remember.
@@ -2199,8 +2215,8 @@
 .B FIGNORE
 is excluded from the list of matched filenames.
 A sample value is
-.if t \f(CW".o:~"\fP.
-.if n ".o:~"
+.if t \f(CW".o:\*(ti"\fP.
+.if n ".o:\*(ti"
 (Quoting is needed when assigning a value to this variable,
 which contains tildes).
 .TP
@@ -2254,7 +2270,7 @@
 The name of the file in which command history is saved (see
 .SM
 .B HISTORY
-below).  The default value is \fI~/.bash_history\fP.  If unset, the
+below).  The default value is \fI\*(ti/.bash_history\fP.  If unset, the
 command history is not saved when a shell exits.
 .TP
 .B HISTFILESIZE
@@ -2367,7 +2383,7 @@
 The filename for the
 .B readline
 startup file, overriding the default of
-.FN ~/.inputrc
+.FN \*(ti/.inputrc
 (see
 .SM
 .B READLINE
@@ -2447,7 +2463,7 @@
 Example:
 .RS
 .PP
-\fBMAILPATH\fP=\(aq/var/mail/bfox?"You have mail":~/shell\-mail?"$_ has mail!"\(aq
+\fBMAILPATH\fP=\(aq/var/mail/bfox?"You have mail":\*(ti/shell\-mail?"$_ has mail!"\(aq
 .PP
 .B Bash
 can be configured to supply
@@ -2675,7 +2691,7 @@
 The second character is the \fIquick substitution\fP
 character, which is used as shorthand for re-running the previous
 command entered, substituting one string for another in the command.
-The default is `\fB^\fP'.
+The default is `\fB\*(ha\fP'.
 The optional third character is the character
 which indicates that the remainder of the line is a comment when found
 as the first character of a word, normally `\fB#\fP'.  The history
@@ -2979,7 +2995,7 @@
 .B SHELL BUILTIN COMMANDS
 below).
 .SS Tilde Expansion
-If a word begins with an unquoted tilde character (`\fB~\fP'), all of
+If a word begins with an unquoted tilde character (`\fB\*(ti\fP'), all of
 the characters preceding the first unquoted slash (or all characters,
 if there is no unquoted slash) are considered a \fItilde-prefix\fP.
 If none of the characters in the tilde-prefix are quoted, the
@@ -2997,11 +3013,11 @@
 Otherwise, the tilde-prefix is replaced with the home directory
 associated with the specified login name.
 .PP
-If the tilde-prefix is a `~+', the value of the shell variable
+If the tilde-prefix is a `\*(ti+', the value of the shell variable
 .SM
 .B PWD
 replaces the tilde-prefix.
-If the tilde-prefix is a `~\-', the value of the shell variable
+If the tilde-prefix is a `\*(ti\-', the value of the shell variable
 .SM
 .BR OLDPWD ,
 if it is set, is substituted.
@@ -3354,10 +3370,10 @@
 the substitution operation is applied to each member of the
 array in turn, and the expansion is the resultant list.
 .TP
-${\fIparameter\fP\fB^\fP\fIpattern\fP}
+${\fIparameter\fP\fB\*(ha\fP\fIpattern\fP}
 .PD 0
 .TP
-${\fIparameter\fP\fB^^\fP\fIpattern\fP}
+${\fIparameter\fP\fB\*(ha\*(ha\fP\fIpattern\fP}
 .TP
 ${\fIparameter\fP\fB,\fP\fIpattern\fP}
 .TP
@@ -3370,11 +3386,11 @@
 Each character in the expanded value of \fIparameter\fP is tested against
 \fIpattern\fP, and, if it matches the pattern, its case is converted.
 The pattern should not attempt to match more than one character.
-The \fB^\fP operator converts lowercase letters matching \fIpattern\fP
+The \fB\*(ha\fP operator converts lowercase letters matching \fIpattern\fP
 to uppercase; the \fB,\fP operator converts matching uppercase letters
 to lowercase.
-The \fB^^\fP and \fB,,\fP expansions convert each matched character in the
-expanded value; the \fB^\fP and \fB,\fP expansions match and convert only
+The \fB\*(ha\*(ha\fP and \fB,,\fP expansions convert each matched character in the
+expanded value; the \fB\*(ha\fP and \fB,\fP expansions match and convert only
 the first character in the expanded value.
 If \fIpattern\fP is omitted, it is treated like a \fB?\fP, which matches
 every character.
@@ -3796,7 +3812,7 @@
 is a
 .B !
 or a
-.B ^
+.B \*(ha
 then any character not enclosed is matched.
 The sorting order of characters in range expressions is determined by
 the current locale and the values of the
@@ -4530,7 +4546,7 @@
 .B ++\fIid\fP \-\-\fIid\fP
 variable pre-increment and pre-decrement
 .TP
-.B ! ~
+.B ! \*(ti
 logical and bitwise negation
 .TP
 .B **
@@ -4554,7 +4570,7 @@
 .B &
 bitwise AND
 .TP
-.B ^
+.B \*(ha
 bitwise exclusive OR
 .TP
 .B |
@@ -4569,7 +4585,7 @@
 .B \fIexpr\fP?\fIexpr\fP:\fIexpr\fP
 conditional operator
 .TP
-.B = *= /= %= += \-= <<= >>= &= ^= |=
+.B = *= /= %= += \-= <<= >>= &= \*(ha= |=
 assignment
 .TP
 .B \fIexpr1\fP , \fIexpr2\fP
@@ -5214,14 +5230,14 @@
 Typing the
 .I suspend
 character (typically
-.BR ^Z ,
+.BR \*(haZ ,
 Control-Z) while a process is running
 causes that process to be stopped and returns control to
 .BR bash .
 Typing the
 .I "delayed suspend"
 character (typically
-.BR ^Y ,
+.BR \*(haY ,
 Control-Y) causes the process to be stopped when it
 attempts to read input from the terminal, and control to
 be returned to
@@ -5233,7 +5249,7 @@
 command to continue it in the foreground, or
 the
 .B kill
-command to kill it.  A \fB^Z\fP takes effect immediately,
+command to kill it.  A \fB\*(haZ\fP takes effect immediately,
 and has the additional side effect of causing pending output
 and typeahead to be discarded.
 .PP
@@ -5545,7 +5561,7 @@
 .SM
 .B INPUTRC
 variable.  If that variable is unset, the default is
-.IR ~/.inputrc .
+.IR \*(ti/.inputrc .
 If that file  does not exist or cannot be read, the ultimate default is
 .IR /etc/inputrc .
 When a program which uses the readline library starts up, the
@@ -5644,7 +5660,7 @@
 .br
 "\eC\-x\eC\-r": re\-read\-init\-file
 .br
-"\ee[11~": "Function Key 1"
+"\ee[11\*(ti": "Function Key 1"
 .RE
 .PP
 In this example,
@@ -5655,7 +5671,7 @@
 is bound to the function
 .BR re\-read\-init\-file ,
 and
-.I "ESC [ 1 1 ~"
+.I "ESC [ 1 1 \*(ti"
 is bound to insert the text
 .if t \f(CWFunction Key 1\fP.
 .if n ``Function Key 1''.
@@ -6367,7 +6383,7 @@
 .B HISTORY EXPANSION
 below for a description of history expansion.
 .TP
-.B history\-expand\-line (M\-^)
+.B history\-expand\-line (M\-\*(ha)
 Perform history expansion on the current line.
 See
 .SM
@@ -6583,7 +6599,7 @@
 .B Bash
 attempts completion treating the text as a variable (if the
 text begins with \fB$\fP), username (if the text begins with
-\fB~\fP), hostname (if the text begins with \fB@\fP), or
+\fB\*(ti\fP), hostname (if the text begins with \fB@\fP), or
 command (including aliases and functions) in turn.  If none
 of these produces a match, filename completion is attempted.
 .TP
@@ -6628,11 +6644,11 @@
 List the possible completions of the text before point,
 treating it as a filename.
 .TP
-.B complete\-username (M\-~)
+.B complete\-username (M\-\*(ti)
 Attempt completion on the text before point, treating
 it as a username.
 .TP
-.B possible\-username\-completions (C\-x ~)
+.B possible\-username\-completions (C\-x \*(ti)
 List the possible completions of the text before point,
 treating it as a username.
 .TP
@@ -7037,7 +7053,7 @@
 the variable
 .SM
 .B HISTFILE
-(default \fI~/.bash_history\fP).
+(default \fI\*(ti/.bash_history\fP).
 The file named by the value of
 .SM
 .B HISTFILE
@@ -7265,13 +7281,13 @@
 If \fIstring\fP is missing, the string from the most recent search is used;
 it is an error if there is no previous search string.
 .TP
-.B \d\s+2^\s-2\u\fIstring1\fP\d\s+2^\s-2\u\fIstring2\fP\d\s+2^\s-2\u
+.B \d\s+2\*(ha\s-2\u\fIstring1\fP\d\s+2\*(ha\s-2\u\fIstring2\fP\d\s+2\*(ha\s-2\u
 Quick substitution.  Repeat the previous command, replacing
 .I string1
 with
 .IR string2 .
 Equivalent to
-``!!:s\d\s+2^\s-2\u\fIstring1\fP\d\s+2^\s-2\u\fIstring2\fP\d\s+2^\s-2\u''
+``!!:s\d\s+2\*(ha\s-2\u\fIstring1\fP\d\s+2\*(ha\s-2\u\fIstring2\fP\d\s+2\*(ha\s-2\u''
 (see \fBModifiers\fP below).
 .TP
 .B !#
@@ -7283,7 +7299,7 @@
 .B :
 separates the event specification from the word designator.
 It may be omitted if the word designator begins with a
-.BR ^ ,
+.BR \*(ha ,
 .BR $ ,
 .BR * ,
 .BR \- ,
@@ -7302,7 +7318,7 @@
 .I n
 The \fIn\fRth word.
 .TP
-.B ^
+.B \*(ha
 The first argument.  That is, word 1.
 .TP
 .B $
@@ -10987,7 +11003,7 @@
 .PD 0
 .RS
 .IP \(bu
-quoting the rhs of the \fB[[\fP command's regexp matching operator (=~)
+quoting the rhs of the \fB[[\fP command's regexp matching operator (=\*(ti)
 has no special effect
 .RE
 .PD
@@ -11227,7 +11243,7 @@
 \fIPortable Operating System Interface (POSIX) Part 2: Shell and Utilities\fP, IEEE --
 http://pubs.opengroup.org/onlinepubs/9699919799/
 .TP
-http://tiswww.case.edu/~chet/bash/POSIX -- a description of posix mode
+http://tiswww.case.edu/\*(tichet/bash/POSIX -- a description of posix mode
 .TP
 \fIsh\fP(1), \fIksh\fP(1), \fIcsh\fP(1)
 .TP
@@ -11250,16 +11266,16 @@
 .FN /etc/bash.bash.logout
 The systemwide login shell cleanup file, executed when a login shell exits
 .TP
-.FN ~/.bash_profile
+.FN \*(ti/.bash_profile
 The personal initialization file, executed for login shells
 .TP
-.FN ~/.bashrc
+.FN \*(ti/.bashrc
 The individual per-interactive-shell startup file
 .TP
-.FN ~/.bash_logout
+.FN \*(ti/.bash_logout
 The individual login shell cleanup file, executed when a login shell exits
 .TP
-.FN ~/.inputrc
+.FN \*(ti/.inputrc
 Individual \fIreadline\fP initialization file
 .PD
 .SH AUTHORS

Attachment: signature.asc
Description: PGP signature

Reply via email to