Re: z/OS porting issues, UTF-8 support, and the groff man(1) page

2023-04-01 Thread Ralph Corderoy
Hi Dave,

> whereas < and > are pretty common for this and no one will bat an eye
> at those in non-UTF-8 contexts.

   ‘The angle-bracket "<" and ">" and double-quote (") characters are
excluded because they are often used as the delimiters around URI in
text documents and protocol fields.’
   — https://www.rfc-editor.org/rfc/rfc2396, §2.4.3


   ‘The recommendation is that the angle brackets (less than and greater
than signs) of the ASCII set be used for this purpose.

   ‘...

   ‘Example

   ‘Yes, Jim, I found it under  but
 you can probably pick it up from .’
— https://www.w3.org/Addressing/URL/5.1_Wrappers.html

-- 
Cheers, Ralph.

‘Short words are best and the old words when short are best of all.’
 — Winston Churchill



Proposed: stop subjecting right-hand sides of `char` family requests to character translation

2023-04-01 Thread Douglas McIlroy
I went to see what this proposal meant and ran into undefined jargon
in groff_char.7. Yes, info groff probably tells me more than I want to
know. Still, I expect the man page to be terse, but intelligible.

What's an "entity"?  Fortunately, Dave Kemper's post shed light on
this question.

The first use of .char that came to mind was
.char \[ntilde] \o'n~'
which would collide badly with the following ancient trick for
unbreakable, unpaddable space. (Ignore the question of whether the
tilde at hand is usable as a diacritical.)
.tr ~
a~b~c
This, I guess, is typical of the motivation for the change.

Suppose the change isn't made? What does .char do for you that .ds
doesn't? Certainly nothing essential in the example above. However, it
can avoid the ugliness of string invocations.

I regard the potential benefit mentioned in the last sentence as
unpersuasive, but the potential catastrophe of the initial example as
tilting the scales toward the proposal. .

Doug



Re: z/OS porting issues, UTF-8 support, and the groff man(1) page

2023-04-01 Thread Mike Fulton
On Fri, Mar 31, 2023 at 2:55 PM G. Branden Robinson <
g.branden.robin...@gmail.com> wrote:

> [adding Dave to CC; seek your name below for my magical summons]
>
> At 2023-03-31T13:05:09-0700, Mike Fulton wrote:
> > On Fri, Mar 31, 2023 at 8:57 AM G. Branden Robinson <
> > > As a groff developer, I'm interested in minimizing the number of
> > > patches you have to carry "downstream" to support groff.
> > >
> > Definitely - I have not yet been able to build with the 'git' dev
> > build but instead have been building from the tarball. I was planning
> > to work to upstream changes once I had the 'git' build working (we are
> > getting there now that we have more tools in place - it's a circuitous
> > process!)
>
> When you're ready to make that shift, be sure to read the "INSTALL.REPO"
> file in the root of the repository or distribution archive.
>

Bruno Haible has provided an enhancement to gnu libiconv that now 'falls
back'
to < and > from the mathematical angled brackets.
The net of that change is that 'man groff' now works for me, which is great!
I _do_ want to tackle the other things that are brought up here as well
(in particular getting a proper fix for my sed hack) and I want to figure
out how
to build man so that I can get 'true' UTF-8 support in my man pages.

I am going to take a crack at getting the 'git build' going. I will reach
out once
I have made progress with that. Hopefully it won't be too hard - depends on
how
many other tools are required for bootstrap/configure. It sounds like that
may also
help with my 'sed' problems (see below).

>
> > > I assume the change here:
> > >
> > >
> https://github.com/ZOSOpenTools/groffport/blob/main/patches/makevarescape.sed.patch
> > >
> > > is due to a limitation of the system's sed(1)?
> > >
> > Yes - that is the change. No - it's not because of sed. We have ported
> > sed and could rely on it as a dependency. The issue we hit is a bit
> > ugly.  Because z/OS is a 'multi-tenant' operating system, we want
> > people to be able to install into a particular location of their
> > choice (either as developer _or_ as a consumer of the binary).
>
> ...without a recompile, I assume?
>
Correct. Without a recompile.

>
> > To make that work, we run a post-process on the files when someone
> > downloads them to change the install 'root' location from where we
> > built the code to the target location they want to install into.  It's
> > ugly and we end up doing a find across files to do this trick. If that
> > 'sed' change is in there, we end up 'missing' some particular updates
> > because the string gets changed on us for the 'root' and so I took out
> > that sed update (a complete hack that I need to do better).
>
> Ah.  Hmm.  I can think of a better way, although it won't (completely)
> help groff 1.22.4.
>
> For groff 1.23, I revised our man pages to be much more careful about
> documenting full file specifications to groff-installed files and to
> compute their values based on the build's configuration
> parameters--stuff like "./configure --prefix=/home/foobar".
>
I will check this out - maybe the problem 'goes away' in 1.23.

>
> Something I think you could do starting with the 1.23.0 release
> candidates--if you keep the groff build tree around somewhere--is to
> perform your sed operation on all the *.man files in the source tree
> (and build tree, if it is separate), sniping any of the existing fodder
> for sed replacement that you find appropriate.
>
> To be concrete, I'm talking about this stuff:
>
>
> https://git.savannah.gnu.org/cgit/groff.git/tree/Makefile.am?id=e3824d611be904bad22176f4f4eb282a5352509d#n864
>
> So your multi-tenancy assistance script could do something like this:
>
> MANS=$(find groff-source-dir groff-build-dir -name "*.man")
> sed -i 's#@BINDIR@#'"$TENANT_HOME"'/bin#g' $MANS
> cd groff-build-dir
> make man-all # You can thank Keith Marshall for suggesting this.
>
I will try the 'git build' first and see what that looks like.

>
> ...and as Emeril Lagasse would say, "bam!"  The pages will be
> regenerated with correct file specifications with no cumbersome
> workarounds.  And thanks to makevarescape.sed, if the file names wind up
> being long, they'll break in pleasant locations and won't be hyphenated.
>
> Or so I predict, not having actually done this concretely.
>
> If you're wondering why you need to search both the build and source
> directories for .man documents, that's my fault.
>
>
> https://git.savannah.gnu.org/cgit/groff.git/commit/?id=31536c517dfe49b4e4a715a732f76b701531e90a
>
> > > Interestingly, this meshes closely with groff's assumptions.  Due to
> > > its chronological origins ca. 1990, it does not accept UTF-8 input,
> > > but it aware of UTF-8 and can produce it as output.  The formatter,
> > > troff(1), accepts ISO Latin-1 input, except on systems where the C
> > > preprocessor macro "IS_EBCDIC_HOST" evaluates true; it then assumes
> > > that its input is encoded using code page 1047.
> > >
> > From my perspecti

Re: Proposed: stop subjecting right-hand sides of `char` family requests to character translation

2023-04-01 Thread G. Branden Robinson
Hi Doug,

At 2023-04-01T19:45:19-0400, Douglas McIlroy wrote:
> I went to see what this proposal meant and ran into undefined jargon
> in groff_char.7.

This, and phrases like "in the actual version", are regrettable defects
in the groff 1.22.4 version of this man page.

The one in the groff 1.23.0.rc2 and .rc3 release candidates does not
have them.  This page is one that I've heavily revised.  I'm attaching a
copy for your consideration.  I'd particularly welcome your comments on
the new "History" section.

> Yes, info groff probably tells me more than I want to know. Still, I
> expect the man page to be terse, but intelligible.

Fair.  I hope the intelligibility of the present form is improved.

> What's an "entity"?

Suggestive of conceptual fuzziness on the part of the writer, I would
propose.  But I can't blame them; the difficulty of comprehending
groff's flexible and complex character to glyph transformation process
is the main reason I have not yet revised that part of our Texinfo
manual.

> Fortunately, Dave Kemper's post shed light on this question.
> 
> The first use of .char that came to mind was
> .char \[ntilde] \o'n~'
> which would collide badly with the following ancient trick for
> unbreakable, unpaddable space. (Ignore the question of whether the
> tilde at hand is usable as a diacritical.)
> .tr ~
> a~b~c

You may be one of a dwindling number of people for whom that ancient
trick comes to mind.  :)  But we do continue to support it, and I see no
reason to withdraw it.

> This, I guess, is typical of the motivation for the change.

I was spurred into this by noticing a problem last July with what I
think was a historical troff document.  I can't lay my hands it now, but
the following short example suggests the issue.

$ cat EXPERIMENTS/tr-in-env.roff
.nf
.tr ab
bab
.ev 1
bab
.br
.ev
bab
.pl \n(nlu

This produces 3 lines of "bbb".

The problem I observed, as best I can recall, was that a document
temporarily used `tr` to make input more convenient.

The trouble was, the same character they were translating turned up in
one of their page headers or footers.

So, depending on how the document got modified and the resulting
placement of the `tr`-ed material, the headers/footers might get
corrupted or might not.

A lengthier, but contrived, example of this is at
.

I suppose there are workarounds one could coach the user to undertake in
such a situation, but once I got to thinking about it, it struck me that
there should be a cleaner division of responsibility between `tr` and
`char`.

My suggestion is twofold: (1) that `tr` should be used for permuting
what we can term groff's internal character set; meaning the 94
printable characters of ASCII/Basic Latin, and whatever special
characters happen to be defined; and (2) `char` and `rchar` are for
adding and removing members of the set of special characters.  (You can
try to `rchar` an ordinary Basic Latin character; it will silently fail.
I mean to make that no longer silent.[1])

It is necessary to consider the impact of these processes on diversions.
I don't presently think my proposal is disruptive to the status quo in
that respect.  When a diversion is populated, special character
definitions are already resolved, and just as with string
interpolations, using the `unformat` request does not recover their
original forms.

Illustration (with groff 1.22.4):

$ cat EXPERIMENTS/char-in-a-diversion.groff
.nf
.char \[zz] FNORD
.di XX
You didn't \[zz] this.
.di
Hello, world.
diverted XX: \c
.XX
.unformat XX
unformatted XX: \*[XX]
.pl \n[nl]u
$ nroff -Tascii EXPERIMENTS/char-in-a-diversion.groff
Hello, world.
diverted XX: You didn't FNORD this.
unformatted XX: You didn't FNORD this.

$

> Suppose the change isn't made? What does .char do for you that .ds
> doesn't? Certainly nothing essential in the example above. However, it
> can avoid the ugliness of string invocations.

I don't remember where I saw this trick, but you can use a
`char`-defined object as a margin character, and I suppose just about
anywhere else the language syntax is accepting of an atomic character.
The utility of this comes in when realizing that someone might
reasonably want to set a margin character in a particular typeface
(maybe it's a dingbat--most of these don't have special character names)
and/or in a certain color.

Recasting the language of the 1.22.4 Texinfo manual, `char` is described
as doing this to the RHS of its definition: "[the RHS] is processed in a
temporary environment and the result is wrapped up into a single object.
Compatibility mode is turned off and the escape character is set to '\'
while [it] is being processed.  Any emboldening, constant spacing or
track kerning is applied to this object rather than to individual
characters in [it]."

> I regard the potential benefit mentioned in the last sentence as
> unpersuasive, but the potential catastrophe of the initial example as
> tilting the sc

Re: z/OS porting issues, UTF-8 support, and the groff man(1) page

2023-04-01 Thread G. Branden Robinson
At 2023-04-01T16:47:25-0700, Mike Fulton wrote:
> On Fri, Mar 31, 2023 at 2:55 PM G. Branden Robinson <
> g.branden.robin...@gmail.com> wrote:
> > When you're ready to make that shift, be sure to read the
> > "INSTALL.REPO" file in the root of the repository or distribution
> > archive.
> >
> 
> Bruno Haible has provided an enhancement to gnu libiconv that now
> 'falls back' to < and > from the mathematical angled brackets.  The
> net of that change is that 'man groff' now works for me, which is
> great!

Glad to hear it!  I've got a change stashed to add fallbacks on the
groff side too.  There's not much to it.

$ git stash show -p 0
diff --git a/tmac/tty.tmac b/tmac/tty.tmac
index 35a527c32..2a28a7dd2 100644
--- a/tmac/tty.tmac
+++ b/tmac/tty.tmac
@@ -51,6 +51,8 @@
 .fchar \[lA] <=
 .fchar \[rA] =>
 .fchar \[hA] <=>
+.fchar \[la] <
+.fchar \[ra] >
 .fchar \[rg] (R)
 .fchar \[OE] OE
 .fchar \[oe] oe

> I am going to take a crack at getting the 'git build' going. I will
> reach out once I have made progress with that. Hopefully it won't be
> too hard - depends on how many other tools are required for
> bootstrap/configure. It sounds like that may also help with my 'sed'
> problems (see below).

The build dependencies for groff 1.23.0.rc2 and later distribution
archives are, on net, _lighter_ than for groff 1.22.4; you no longer
need a TeX installation.  (You do need an m4 program, though.)

Some background on this can be found at
.

> > > Yes - that is the change. No - it's not because of sed. We have
> > > ported sed and could rely on it as a dependency. The issue we hit
> > > is a bit ugly.  Because z/OS is a 'multi-tenant' operating system,
> > > we want people to be able to install into a particular location of
> > > their choice (either as developer _or_ as a consumer of the
> > > binary).
> >
> > ...without a recompile, I assume?
> >
> Correct. Without a recompile.

Without a recompile means without re-./configure-ing, so I think you'll
need my build-tree alteration trick.  If your multi-tenancy arrangement
keeps a copy of, say, a generic groff build which can then be copied to
some staging area for user customization, I reckon you could _either_
re-configure and rebuild, or do the simpler sed trick I suggested.

> > For groff 1.23, I revised our man pages to be much more careful
> > about documenting full file specifications to groff-installed files
> > and to compute their values based on the build's configuration
> > parameters--stuff like "./configure --prefix=/home/foobar".
> >
> I will check this out - maybe the problem 'goes away' in 1.23.

I don't _expect_ it to unless you re-configure and build for each user's
scenario...at least if groff gets installed to a location that
identifies the user, as '/home/branden' does, for example.  If, instead,
you have a more tightly compartmentalized user-specific view of the
system that also uses fixed directory names for the software packages,
(e.g., groff is always in /opt/groff, but only users that have selected
it will see it) you might indeed benefit from the groff 1.23
improvements here.  :)

> I will try the 'git build' first and see what that looks like.

I'm eager to hear your experience.

> > And thanks to makevarescape.sed, if the file names wind up being
> > long, they'll break in pleasant locations and won't be hyphenated.

I need to correct myself here.  My suggestion bypasses
makevarescape.sed, so the post-build rewritten @BINDIR@ trick will not
be protected from automatic hyphenation or have good hyphenless break
points in it.  However, if you do end up going back to a sed solution,
it's possible to put those in yourself in an automated way.

The idea is to do what the last line of makevarescape.sed is already
doing: prefix the file specification with (as they appear in the
generated man page document) '\%' and place '\:' after every sequence of
forward slash characters.

> Over the years, the operating system has evolved from MVS

When I first got on the Internet, I encountered this initialism at the
same time as VMS and VM/CMS, which just about broke me.

> to OS/390 to z/OS.  What is shipped with the operating system has
> evolved too. Up until the 80's, there was no POSIX environment
> available.

Would have been difficult; there was no POSIX before 1988.  :)

> That was added in the early 90's as 'Open Edition'. Back in the 90's
> it was optional, but now, it's always available on the z/OS system
> (although you can still restrict users to not be able to _use_ the
> POSIX environment if you want).  So, we now have lots of names for the
> same thing: OS/390 Unix, Unix System Services, Open Edition. Some
> services still spit out the old names (so that tools don't get broken)
> so you will see comparisons to 'OS/390' and sometimes to 'z/OS'.  It's
> important to note that the hardware (e.g a Z16) runs a variety of
> operating systems including Linux, z/OS, z/VM, z/TPF, z/VSE. The