[redirecting to groff@gnu discussion list] Hi Mike,
At 2023-03-30T20:51:06-0700, Mike Fulton wrote: > When I perform 'man groff', I am getting a failure when iconv converts > from UTF-8 to ISO8859-1 on the groff 1.22.4 17 December 2018 built > tarball. The angle brackets from the .UR and .UE entries do not > convert properly (they show up as spaces and I see a failure after > quitting 'man') > > I have reduced this down to the angled brackets being generated in UTF-8 > that can't be converted to ISO8859-1, in particular for the text: > > ``` > GNU <http://www.gnu.org> > ``` > > which comes from: > > ``` > > system within the free software collection > .UR http://\:www.gnu.org > GNU > .UE . > > ``` > > The text in the intermediate UTF-8 file has UTF-8 angle bracket > characters. > > When I manually try to do translation of the intermediate UTF-8 file, > I can confirm that iconv fails on the UTF-8 open angled bracket. > > Unfortunately the latest dev code seems to have removed the .UR and > .UE reference so I can't show the problem in the dev line - so I'm > curious if this was a bug or just a code change? > > I am seeing this failure on z/OS. When I try to run on the Mac via > brew install, it works fine but it is a slightly newer time stamp of > man page (23 December 2018). > > Apologies if this was already asked - I didn't see it when I searched > for man page issues. I in turn must apologize because I have a variety of independent observations to share. There have been significant changes to the groff man(7) package in groff's Git repository since 1.22.4 was released. Since the groff(1) man page is written using the man(7) macro package, changes to the package impact the rendering of document, just as changes to the document itself can. We are currently attempting to finalize groff 1.23.0 for release; the latest release candidate, 1.23.0.rc3, was tagged a little over a month ago. You can obtain it from the alpha.gnu.org site. https://alpha.gnu.org/gnu/groff/ z/OS is one of the environments I'm intensely curious to hear feedback about; we don't often hear from z/OS users on the groff development list, and I am curious about many aspects of current character encoding support on that operating system. There may be claims about code page 1047 in groff's documentation that are outdated and require correction. Getting to the particulars of the issue you reported... 1. groff man(7) does indeed attempt to mark up URIs with angle bracket glyphs that are not found in the ISO 8859 character sets. 2. I feel like the man(1) program on your system should not be trying to get groff to format documents using the UTF-8 character encoding if they are going to have to be degraded to ISO Latin-1 afterwards. groff's terminal output driver, grotty(1), supports the production of ISO Latin-1 directly. I therefore feel that iconv(1) should not be involved. 3. iconv(1) could fall back to Unicode basic Latin "<" and ">" characters for the left and right angle brackets, U+2329 and U+232A, respectively. 4. So too could groff; we have a mechanism for defining fallbacks for unavailable glyphs. The special character escape sequences for angle brackets are \(la and \(ra. These are not portable to (some) descendants of AT&T troff, so groff man(7) uses them only if the formatter is not GNU troff. However, at present, and even in groff Git HEAD, we do not define fallbacks for \[la] and \[ra] to < and >, respectively. I don't know why this has been overlooked for so long. Perhaps there is an argument against it? (Such definitions, if they were implemented, would appear in the tmac/tty.tmac or tmac/tty-char.tmac files, which are installed to a directory like /usr/share/groff/1.23.0/tmac). 5. You might be able to work around this issue by editing your man.local file to define the fallbacks. On my Debian system, this file is installed to /etc/groff/man.local; I'm not sure where it appears in the build of groff for z/OS that you're using. We've improved our man pages to better document where these files go, but unfortunately those improvements are post-1.22.4 developments. Perhaps your system's package manager can help you locate this file. I'm hesitant to recommend a specific recipe for defining such fallbacks before I better understand how your system's man(1) is invoking groff(1), because what I suggest might not work. Also my recommendation might do you no good if you don't have Unix permissions to edit the man.local file. There is also the issue that even on a UTF-8 terminal, if the font the terminal emulator is using doesn't have coverage for code points, they won't appear anyway; neither groff nor man(1) have any means of obtaining this information. Please advise how you'd like to proceed, and I will try to help. Regards, Branden
signature.asc
Description: PGP signature