groff now undoing .ad settings after .IP

2024-03-14 Thread Russ Allbery
pod2man has, for a couple of decades, added:

.if n .ad l

to the top of every generated man page after the .TH macro on the grounds
that, for an output device that only has full-width spaces and no support
for subtle adjustments of interword spacing, ragged right is a more
readable default than full justification.

This broke in groff 1.23.0.  Now, any .TP directive restores full
justification for all subsequent text.  This appears to be due to the
addition of:

.  ad \\*[AD]

in an-write-paragraph-tag.  I'm not sure why this was added, but I've
already gotten multiple complaints about full justification cropping up in
pod2man-generated man pages with current versions of groff.

Assuming this is intentional, is there some way that I can avoid this
change in behavior shy of the brute force approach of adding .if n .ad l
after every .TP tag?  That seems fairly ugly and also further undermines
the intent of the AD register.

The other obvious option is:

.if n .ds AD l

before the .TH macro, which will technically work and which doesn't really
break anything that isn't already broken because pod2man-generated man
pages currently break the intent of the AD register anyway, but
groff_man(7) makes it clear that you don't want me to do that and I do
understand why.

I would be happy to support the AD register and let the user choose if
they have an explicit preference.  I just want to change the default under
nroff to l to stay consistent with the past 20 years of pod2man output.
Maybe we can find some way that I can signal to the an macro set that I
want a different default when running under nroff, but am happy to have my
chosen default overridden by the AD register if it's set?

(I understand that I can explain to users that they can set MANROFFOPT to
-dAD=l to get this behavior for every man page, and I like that this
option exists, but I'm not willing to take the regression and support hit
of having to explain this to every user.)

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: groff now undoing .ad settings after .IP

2024-03-15 Thread Russ Allbery
Dave Kemper  writes:
> On Fri, Mar 15, 2024 at 8:34 AM G. Branden Robinson
>  wrote:
>> At 2024-03-14T22:02:26-0700, Russ Allbery wrote:
>> > Now, any .TP directive restores full justification for all subsequent
>> > text.  This appears to be due to the addition of:
>> >
>> > .  ad \\*[AD]
>> >
>> > in an-write-paragraph-tag.  I'm not sure why this was added,
>>
>> Here's the commit message.
>>
>> https://git.savannah.gnu.org/cgit/groff.git/commit/?id=e7094b209f0f39fc16de687f116ea9a9c1ba0364

> It doesn't affect the larger point of this email, but the specific .ad
> call Russ cites (in an-write-paragraph-tag) appears to have been added
> in response to http://savannah.gnu.org/bugs/?62051 .

Right, the invocation in .TH was already handled because groff is not the
first an macro implementation to set the adjustment in the body of the .TH
macro, so pod2man puts the ".if n .ad l" line after the .TH invocation.
It was the change to the implementation of .TP/.IP that caused the
user-visible behavior change.

It's been enough years that I don't recall what other implementation set
adjustment in the .TH macro body, although I do remember originally
setting adjustment in the preamble and having to move it until after the
.TH invocation for it to be effective.  Given what systems I was doing
development on the time, it's possible it was Solaris of the 2.6 or 7
vintage.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: groff now undoing .ad settings after .IP

2024-03-15 Thread Russ Allbery
"G. Branden Robinson"  writes:

> Can you name me a misbehaving perlpod(1)-generated page?  I'll check it
> out.

Yeah, pod2man(1) itself will do it on a Debian system with recent groff.
You'll see justification switch from ragged right to full in the body of
the first option documented in OPTIONS.

> Perhaps what I really needed here was:

Aha, yes, I was wondering if you could just save and restore the
adjustment rather than having to set it to the AD register.

> A long time for sure.  In Solaris _10_ troff's tmac.an, which has a
> GitHub repo,[1] `TH` has a relict ".if n .na"--that is, it's commented
> out.  But for how long it had been commented out, I have no idea.  And
> sure enough, SSHing to a Solaris 10 box and "man ls" tells me that
> adjustment to both margins is on.

> [Continues poking around, in non-publicly available sources...]

> Ah.  Same commented out request in the same tmac.an macro in...

> Drum roll, please...

> SunOS 4 (1988).

> Same thing in SunOS 3.5, SunOS 3.2, and SunOS 2.0 (1982!).

> So as with many decisions James Clark made with groff, he was aiming at
> straight-down-the-line Sun compatibility.  Mystery solved.

Must have been some other implementation where I ran into this, then.  I
think at the time I was testing on a pile of different commercial UNIXes.

Hm, I tracked down the commit message where I changed from ".if n .na" in
the preamble to ".if n .ad l" after .TH, and the comment claims it was for
groff.

* lib/Pod/Man.pm (guesswork): Recognize more uses of hyphens in
regular English text and allow them to be regular hyphens.
(preamble): Turn off hyphenation and, for nroff, justification
after the .TH macro since that's where groff turns them on.
* t/basic.t: Update for the new preamble.
* t/filehandle.t: Likewise.
* t/man.t: Likewise, and test the new hyphen behavior.
* t/basic.man: Adjust for the new hyphen behavior.

(Bad 2006 version of me, combining unrelated changes into a single
commit.)  It's possible that the comment is referring to the hyphenation
change and I just moved them both together.  I don't appear to have
written down why I switched from ".na" to ".ad l".

".if n .na" was in the generated output of Pod::Man of my very first
version in 1999, which strongly implies that it was in Tom Christiansen's
original pod2man script on which Pod::Man was originally based.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: groff now undoing .ad settings after .IP

2024-03-16 Thread Russ Allbery
Okay, I think we've sorted out the way forward for groff that would
address the immediate issue.  That leaves three questions: should I do
something in the next release of Pod::Man, should I assume that the next
release of groff will default to ragged right, and is there a way for
Pod::Man output to support the intent of the AD register.

"G. Branden Robinson"  writes:

> If I don't get back to you soon enough, go ahead with your initial idea.

> .if n .ds AD l

> You'll want to keep

>> .if n .ad l

> for the sake of non-groff formatters, of course.

I think the logic that would correctly honor the intent of AD and also
maintain backward compatibility would be something like this:

1. If groff before 1.23.0 or not-groff, add ".if n .ad l" after .TH.

2. If groff 1.23.0, add ".if n .ds AD l" before .TH.  Also having
   ".if n .ad l" after .TH would be harmless but insufficient.

3. If groff after 1.23.0... it's not clear what I should do, particularly
   since Pod::Man releases get baked into Perl core and thus have a long
   effective lifetime.  Ideally from my perspective groff would default to
   ragged right and then I should do nothing in that case, since then I
   would be honoring AD correctly.

(I don't know off-hand how to express that logic in roff, so if it does
seem warranted to go with version checks, I'll probably be asking for
assistance in writing them portably, assuming that this is even something
that groff exposes.  Or maybe there's a better proxy for the version
check, like seeing if the AD register is defined.)

There seem to be two groff development and release questions embedded in
here:

1. Do you think you'll change the long-standing groff default from full
   justification to ragged right under nroff in the an macro set for the
   next release?

2. Do you have an expected time frame for the next release, which controls
   whether it's worth bothering with the second piece of logic above.  I'd
   prefer to avoid releasing man pages that change justification partway
   through in the next stable release of Debian, but if a new release is
   coming before the next Debian stable release, it may not be worth
   embedding a fix for one specific groff version in every generated man
   page.

If the answer to 1 is no, then there are other possible questions about
how I could signal desire for a different default (maybe using some other
register, for instance) without interfering with application of AD.  Or
maybe this is overkill and Pod::Man just doesn't support AD, although in
general my development philosophy for Pod::Man is to try to converge on
normal man page practices as much as I can given that POD is a far more
limited language than roff and thus I can't represent some subtleties that
roff can.

> We're getting close to 35 years of ".ad b" being _groff's_ default,
> and even if I think a lot of people won't consciously notice the
> base paragraph indentation change--instead observing that, "huh,
> lines fit now when they didn't used to, that's nice, but
> weird"--_everybody_ will notice a change to adjustment, and some
> will be as loud and destructive as the author of the Worst Man Page
> That Has Ever Existed was.  I think it virtuous to stand up to
> brogrammers.

For what it's worth, in over 20 years of maintaining Pod::Man with this
design choice, I've never gotten a complaint about it, and multiple people
noticed when it broke.  Admittedly, the breakage is worse than always
doing full justification, since it changes in the middle of the man page
and thus looks more wrong, but my anecdotal experience is that people
quite like this "feature" of Pod::Man and the advocates for full
justification in nroff seem to be few and far between.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: groff now undoing .ad settings after .IP

2024-03-17 Thread Russ Allbery
"G. Branden Robinson"  writes:
> At 2024-03-16T12:32:44-0700, Russ Allbery wrote:

>> Okay, I think we've sorted out the way forward for groff that would
>> address the immediate issue.  That leaves three questions: should I do
>> something in the next release of Pod::Man, should I assume that the
>> next release of groff will default to ragged right, and is there a way
>> for Pod::Man output to support the intent of the AD register.

> In summary, I think the answers are, in order:

> 1.  Probably, yes.
> 2.  No.  I have no plans to change it.
> 3.  Not at present.  I'd like to create one for groff 1.24.

Okay, I think this narrows it down.  The Perl 5.40 release is coming up,
and I'd like to get the next release of podlators into it, with the hope
that Debian will go to Perl 5.40 before the trixie release.

I think I have two options for the upcoming podlators 6.0.0 release:

* Do nothing.  Justification will be broken with groff 1.23 and fixed with
  groff 1.24, which will hopefully make it into the trixie release.

* Add ".if n .ds AD l" to the preamble, and leave ".if n .ad l" after .TH
  as is.  This will make the formatting work everywhere the way that it
  historically has (although will get pod2man output no closer to
  supporting AD properly) and means that the time frame for groff 1.24
  becomes irrelevant to podlators, at the cost of an additional line in
  every generated man page.

I think I'm leaning towards the second option because it's simpler and
provides the maximum decoupling of timelines, and one additional line in
the preamble isn't too much of a price.  But I could be talked out of it
if anyone thinks that's the wrong approach.  Note that this will mean the
creation of a whole lot of man pages in the wild that override the AD
register, which will probably be with us for many years to come.  (But
this is already true of all the man pages with ".if n .ad l", which is
arguably an even bigger hammer.)

I can then revisit and potentially add more complicated code should we
figure out a way for pod2man to change the nroff default justification but
still honor AD if set.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: the Courier font family and nroff history

2024-03-22 Thread Russ Allbery
"G. Branden Robinson"  writes:

> That's a good argument against grotty(1) emitting overstriking
> sequences, at least by default, and yet that the people swiftest to
> anger on this subject argue _for_ it.

I'm not fully following this argument, but (assuming I've not completely
lost the train of conversation), it may be relevant here that some years
ago (it was in 2000, which surely was only five or six years ago) a
contributor went to the trouble of writing Pod::Text::Overstrike to format
POD output with backspacing with overstrike or underscores.  At the time,
a version that used termcap already existed (and still does).

The stated reason was that the output was device-independent, unlike
output that embeds formatting codes derived from device-specific termcap
entries, and they really liked the bold and underlining rather than the
plain text or *ad hoc* markup produced by Pod::Text.

I know that to a first approximation all the world is now some variation
of an imaginary VT100 terminal emulator, and thus one can usually blindly
use SGR escape sequences and expect them to work in much the way that one
can assume all programs only run on VMS.  But I have occasionally had
reports that Pod::Text::Overstrike is a better option for (some) Windows
users because apparently their pager handled the overstriking but termcap
(via the Perl Term::Cap module) wasn't available.

I have no idea how dated this information is, having not used Windows
myself in several decades, but I always found it interesting.  I've kept
the module working all these years since it's not much additional effort.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: the Courier font family and nroff history

2024-03-22 Thread Russ Allbery
rt to us, that Windows 10 or 11 has a console
> driver/terminal emulator that does "better" with ECMA-48 support.  I
> haven't heard even a rumor of anything usefully quantitative, like a
> table of its support for standardized escape sequences in comparison
> with, say, xterm, or even the Linux kernel's somewhat wobbly virtual
> console device.  But, supposedly, things are "better".

You may find the NOTES section of the Term::ANSIColor man page
interesting, although it's probably out of date since I rely on other
people to send me updates.  Term::ANSIColor also comes with test files and
a test file generator that one can cat on a display to see what works.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: the Courier font family and nroff history

2024-03-24 Thread Russ Allbery
"G. Branden Robinson"  writes:

> It sounds like the way is clear to change perldoc's default back to
> Pod::Man plus nroff.  ;-)

See https://github.com/briandfoy/pod-perldoc/pull/36.  It looks like it's
being worked on, but there's apparently some complexity downstream of both
of us in figuring out a good pager to use.

> Thomas Dickey has a huge section in the ncurses FAQ about this.

> https://invisible-island.net/ncurses/ncurses.faq.html

> Scroll down to:
>   Why not make “xterm” equated to "xterm-256color"?

This reminded me that it had been a while since I'd looked around Thomas
Dickey's web site, and I spent a very enjoyable day reacquainting myself
with his writing style.  :)

> My copy may be even more out of date than yours, but let's see if we can
> get at least this claim fixed.

>Support for code 3 (italic) is rare and therefore not mentioned
>in that table.  It is not believed to be fully supported by any
>of the terminals listed, although it’s displayed as green in the
>Linux console, but it is reportedly supported by urxvt.

> grotty(1):
>-i Render oblique‐styled fonts (I and BI) with the SGR
>   attribute for italic text rather than underlined text.
>   Many terminals don’t support this attribute; however,
>   xterm(1), since patch #314 (2014‐12‐28), does.  Ignored if
>   -c is also specified.

Ah, thank you.  Confirmed.  I suspect at some point in the past I forgot
that I have to configure xterm to use a TrueType font in order to get a
lot of the font behavior.  (I use a bitmap font because it's substantially
more readable for long periods of time than any TrueType font I've found
at equivalent sizes, but using a bitmap font disables some of xterm's font
family support.)

Also, some of the other parameters that weren't supported the last time I
checked seem to be supported now by at least xterm, namely "crossed-out"
or strikethrough and double-underline.  I need to add those to the module.

I've made some updates, but I am very overdue for another round of work on
that module.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: Debian Salsa and groff license

2024-07-07 Thread Russ Allbery
Colin Watson  writes:

> Yes.  I don't know how to fix this; as far as I'm aware it's not within
> the power of project owners to set this manually (much though that seems
> completely unreasonable).

> It might be worth filing a support issue with salsa.debian.org's admins
> to see whether there's some piece of the licence detection machinery
> that they can kick.

I believe GitLab uses this thing to automatically determine the license of
a project:

https://github.com/licensee/licensee

Looking at the source code, I think it's giving precedence to LICENSES
over COPYING and picking up the copy of the MIT license in that file.

Automatically detecting license information in any complicated scenario is
probably impossible, so it would be nice if GitLab had some way to
override the automatically-detected license.  If it does, I haven't been
able to find it.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



podlators v6.0.0 released

2024-07-10 Thread Russ Allbery
I'm pleased to announce release v6.0.0 of podlators.

I usually don't announce releases here, but since there was an extensive
discussion of some of the changes that went into this release, I thought
I'd send the release announcement.  Let me know if you'd like me to send
podlators release announcements here in the future when they have changes
to Pod::Man.

podlators contains Pod::Man and Pod::Text modules which convert POD input
to *roff source output, suitable for man pages, or plain text.  It also
includes several subclasses of Pod::Text for formatted output to terminals
with various capabilities.  It is the source package for the Pod::Man and
Pod::Text modules included with Perl.

Changes from previous release:

 - Drop support for Perl 5.10.  podlators now requires Perl 5.12 or later.

 - podlators now uses semantic versioning for the package and module
   versions, with a v prefix to work with Perl's packaging system.

 - Pod::Man now translates all "-" characters in the input into *roff "\-"
   escapes (normally rendered as an ASCII hyphen-minus, U+002D) rather
   than using fragile heuristics to decide which characters represent true
   hyphens and which represent ASCII hyphen-minus.  The previous
   heuristics misrendered command names such as apt-get, causing search
   and cut-and-paste issues.  This change may cause line-break issues with
   long hyphenated phrases.  In cases where the intent is a true hyphen,
   consider using UTF-8 as the POD character set (declared with =encoding)
   and using true Unicode hyphens instead of the ASCII "-" character.

 - Pod::Man now disables the special *roff interpretation of "`" and "'"
   characters as paired quotes everywhere, not just in verbatim text, thus
   forcing them to be interpreted as the regular ASCII characters.  This
   also disables the use of "``" and "''" for paired double-quotes.  The
   rationale is similar to that for hyphens: there is no way to tell from
   the POD source that the special interpretation as quotes is intended.
   To produce paired typographic quotes in the output, use UTF-8 and
   Unicode paired quote characters.

 - Man page references in L<> that are detected as such by Pod::Simple are
   now always formatted as man page references even if our normal
   heuristic would not detect them.  This fixes the formatting of
   constructions such as @@RXVT_NAME@@perl(3), which are used by packages
   that format a man page with POD and then substitute variables into it
   at build time.  Thanks to Marco Sirabella for the analysis and an
   initial patch.  (GitHub #21)

 - Add a workaround to Pod::Man to force persistent ragged-right
   justification under nroff with groff 1.23.0.  Thanks to Guillem Jover
   for the report and G. Branden Robinson for the analysis.  (GitHub #23)

 - Fix wrapping of text with S<> markup in all subclasses of Pod::Text.
   Thanks to Jim Avera for the report.  (GitHub #24)

 - Pod::Man now forces a blank line after a nested list contains only
   =item tags without bodies.  In previous versions, the blank line before
   the next item in the surrounding =over block was not included.  Thanks
   to Julien ÉLIE for the report.  (GitHub #26)

 - Import PerlIO before checking for layers so that PerlIO::F_UTF8 is
   available, which fixes double-encoding of output when a :utf8 layer is
   in place and PerlIO is not imported.  Thanks to youpong for the bug
   report, James Keenan for the elaboration, and Graham Knop for the fix.
   (GitHub #25)

 - pod2text --help now exits with status 0, not 1, matching normal UNIX
   command behavior and the behavior of pod2man.  (GitHub #19)

 - Fix tests when NO_COLOR is set in the environment.  (GitHub #20)

You can download it from CPAN or from:

<https://www.eyrie.org/~eagle/software/podlators/>

This package is maintained using Git; see the instructions on the above
page to access the Git repository.

Please let me know of any problems or feature requests not already listed
in the TODO file.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: .CW usage by a new user

2024-09-05 Thread Russ Allbery
"isf (Jordán)"  writes:

> Thats literally what Im trying and the output is something like this:

> Input:
> .CW ""$ echo foobar""

> Ouput:
> echo$

You need three quotes, not two.  Semantically, the whole macro argument is
enclosed in double quotes, and then the double quotes within that
double-quoted string have to be escaped, which *roff does by doubling
them.  So:

.CW """$ echo foobar"""

With only two double quotes, I think *roff parses that as an empty string.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: .CW usage by a new user

2024-09-05 Thread Russ Allbery
"isf (Jordán)"  writes:

> Thanks, but what if I want do something like this:

> .CW "$ echo "foobar""

> Because the output is something like this:

> bar"$ echo foo

> When I want something like this in the PDF.

> $ echo "foobar"

It may help to go through this step by step, or at least that's the way
that I think about it (probably because I mostly write code to generate
*roff rather than write it directly).

Please note that all the intermediate steps here are, by themselves,
invalid, and are just being shown as steps in the process.  Only the final
step is the correct input.

You want that string to all be in a fixed-width font.  So to start with,
you put .CW in front of it:

.CW $ echo "foo   bar"

But the .CW macro takes a single argument and without any surrounding
quotes that argument is going to end with the first space, so you need to
enclose the entire argument in double quotes so that it's one argument:

.CW "$ echo "foo   bar""

Now you have the problem that the interior double quotes that should be
part of the argument are going to be interpreted as ending the quoted
string that comprises the argument.  All of those double quotes inside the
quoted argument have to be escaped.  The way to do that is to replace each
interior double quote with two consecutive double quotes:

.CW "$ echo ""foo  bar"""

This looks like the string ends with three double quotes, but it's better
to think of it as one escaped double quote, which is written as "", and
then the double quote that ends the argument.

This should then produce the output that you want.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: [tz] Doubts about a typo fix

2022-11-25 Thread Russ Allbery
Paul Eggert via tz  writes:

> Thanks for the info about groff. You're right, tzdb man pages are supposed
> to be portable to both groff and traditional troff. For the latter I test
> with /usr/bin/nroff and /usr/bin/troff on Solaris 10, which is the oldest
> troff I know that is still supported.

[...]

> "\f(CW-\fP" is used instead of plain "-" because when the output is PDF,
> it is more clearly visible to humans as a hyphen-minus instead of as a
> hyphen (U+2010 HYPHEN).

You have to be very careful with the combination of \f(CW and \fP on
Solaris 10 nroff, and I suspect the construct you are using has nascent
bugs.  \f(CW doesn't produce a font change on Solaris 2.6 with nroff, so
if you write something like:

\fBsomething\fP \f(CW-\fP something else

you will discover that "something else" is in bold because the second \fP
reverts to the "previous" font, which nroff thinks is \fB becuase \f(CW
was ignored.  (Just tested now on a Solaris 10 host.)  Pod::Man has fairly
elaborate workarounds for this bug.

>> I also note that "CW" is an old, AT&T device-independent
>> troff-compatible font name.[3] groff's preferred name for this face is
>> "CR", because for the past couple of decades a monospace font (often
>> Courier) has generally been available in all four styles (roman,
>> oblique, bold, and bold-oblique).

> Thanks, I didn't know that was preferred. I installed the attached patch
> into the tzdb development repository

Just be warned that \f(CR is not a valid font name in all *roff
implementations, which is why Pod::Man uses \f(CW by default.  Not sure
how much you care.  (And, to be honest, not sure how much anyone should
care about any implementations other than groff and mandoc these days.)

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: [tz] Doubts about a typo fix

2022-11-26 Thread Russ Allbery
"G. Branden Robinson"  writes:

> It's my lucky day!  I've been meaning to buttonhole you for quite some
> time regarding my man(7) reforms and pod2man's output.

I just made another major release, so on the plus side my brain is fully
up to speed with the source, but on the minus side, I'm also a little
tired of working on it.  :)  But, anyway, do tell!

I have some very old mail about better compatibility with the output for
mandoc's HTML conversion sitting around somewhere that I need to respond
to as well.

The standard problem is that I'm still trying to stick as much as possible
to my mission of producing portable *roff, but testing on anything other
than recent Debian with groff is tedious and annoying, so I mostly try not
to change things unless there's an obvious bug.

> For what it's worth, groff and Heirloom doctools nroff don't print
> "something else" in bold (this is true even in Heirloom's default, _not_
> groff compatibility, mode), and DWB 3.3 nroff does.

Yes, I think this bug is specific to Solaris, although it was still
present in Solaris 11.

> Any other fonts, a document needs to test for and be programmed
> defensively regarding.[3]  (It's okay to give up with ".ab".)

Pod::Man uses B, I, CW, CB, and CI only.  (To be honest, part of me is
very tempted to drop the C* typefaces because they're quite annoying to
deal with and the cause of a bunch of bugs, and I'm dubious enough people
still use troff to make it worth the effort, but apparently the HTML
converters may use them and in theory they work now.  So I have left them
alone.)

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: [tz] Doubts about a typo fix

2022-11-26 Thread Russ Allbery
"G. Branden Robinson"  writes:

> For what it's worth, groff and Heirloom doctools nroff don't print
> "something else" in bold (this is true even in Heirloom's default, _not_
> groff compatibility, mode), and DWB 3.3 nroff does.

Oh, incidentally, I ran into what felt to me like a bug in groff that I
think has been there for a while.  Two people noticed it within a month,
but I think the bug has been around for quite a while.

The new comment in Pod::Man largely explains it:

# Originally, this function was much simpler because it went directly from \fB
# to \f(CW and relied on \f(CW clearing bold since it wasn't \f(CB.
# Unfortunately, while this works for mandoc, this is not how groff works;
# \fBfoo\f(CWbar still prints bar in bold.  Therefore, we force the font back
# to the default before each font change.

This sadly results in some rather tedious font manipulation in Pod::Man,
although most of the font complexity is still due to the Solaris bug.

I'm guessing that \f(CR would have cleared bold, and \f(CW doesn't because
it's weird and special, and that's why this isn't a bug?

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: [tz] Doubts about a typo fix

2022-11-28 Thread Russ Allbery
Narrowing this down to the groff list since it doesn't really apply to the
other lists, but please cc me on replies since I'm not subscribed.

"G. Branden Robinson via tz"  writes:

> Most people won't see a difference because groff 1.22.4 (and earlier
> releases going back to, I think, 2009) the man(7) macro package remaps
> the hyphen to the minus sign on the 'utf8' output device.  This will be
> changing in groff 1.23 to improve consistency with man page rendering on
> typesetters.[1]  Workarounds are documented.[2]

Debian may have to override this locally again.  I remember the days
before 2009 when this was the case, and it caused no end of problems,
usually with cutting and pasting switches or code examples.  Sadly, a
large number of upstream man pages used - incorrectly.  (Even putting
aside the problem that, technically, you need \N'45' or some similar thing
because \- is supposed to be minus rather than the ASCII hyphen, as noted
in your second link.)

We tried clean up all the problems previously, but it was like bailing out
the ocean, and I think we stopped once groff changed its default mapping.

> mandoc maintainer Ingo Schwarze and I both recommend against performing
> string definitions, or interpolating strings, in man pages.

Pod::Man of course does tons of this.  I'm always open to alternatives,
but they're all there for a reason

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: man(7), hyphen, and minus

2022-12-13 Thread Russ Allbery
Just a quick reply on one part of this with more to come later.

"G. Branden Robinson"  writes:

> Oh, I know.  I've seen Pod::Man's preamble.  I think what distressed me
> originally about it was that, like docbook-to-man, it seemed to make
> man(7) seem like a write-only language.

This bothers me too, and I made some choices for ease of implementation
rather than readability of output.  (I generally like to prioritize
readability of output; my static site generator cares more about the
readability of the HTML than any sane person probably should.)

The biggest loss there is that I always use font escapes (with elaborate
workarounds for font strangeness in both Solaris nroff and in groff)
rather than what any sane human would do, which is use .B, .BI, .BR, etc.
The specific problem that I have is that I was trying to avoid doing
whole-tree transformations on the POD parse tree, so the transformation is
done locally.  In other words, in something like B<< bold I >>, I
first get a function invocation of cmd_i with text "italic", and then an
invocation of cmd_b with text "bold ".
It's a bit tricky to turn that into "\fBbold \fBIitalic\fR" but it doesn't
require any state tracking.  But if I transform I into
".I italic", life felt rather complicated and I wasn't sure if I was going
to be able to figure out where to go from there, particularly because
there's a bunch of weird complexity about quote escaping required to use
the macros.

I'm also trying to stay very portable and for a long time I knew there
were a bunch of proprietary implementations out there that did random
things (never mind Solaris, what about HP-UX which does some other weird
things).  So for example I don't use .EE/.EX and instead roll my own,
which is kind of sad, let alone stuff like .TQ, .UR, or .SY.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: man(7), hyphen, and minus

2022-12-23 Thread Russ Allbery
"G. Branden Robinson"  writes:

> That's fair, and it isn't the first time I've heard capable people
> express the opinion that having a document translator produce idiomatic
> man(7) font alternation macro calls rather than chains of font selection
> escape sequences was Just Too Damned Hard.  If I could show people how
> to do it, I might do so with a swagger, but I confess I can't cash that
> check at present.

Yeah, the difficulty lies mostly in the layering, because people can write
POD source that is nonsensical in a man page context but that I still have
to do something with.  Stuff like C<<< B<< L >> >>>.  It makes no
sense to make the man page reference, which one could otherwise nicely
represent as:

.BR foo (1)

also bold and fixed-width, but if that's what someone wrote in the POD
source, I have to do *something* with it.  And that means either trying to
analyze global state or having to parse the *roff that I output in an
earlier stage.

> Here, I know your pain.  I took it upon myself to document this shit.

Thanks for this, I should have thought to look at the groff manual about
it.  That corrected a few of my misconceptions about macro arguments.
(It's very easy for this stuff to all become cargo-cult.  I refer to CSTR
54 all the time, but of course that's limited in its detail.)

> I sure hope the reason this was done the way it was because any more
> accessible approach ran the PDP-11 out of memory.  Murray Hill's
> agonizingly slow adoption of 'aq' and 'dq' special character identifiers
> I find difficult to explain given that they bought and paid for a font
> that included these glyphs on their very first typesetting device.

Yes, it's frustrating that one can't portably just use the special
character escapes everywhere.

The additional problem that Pod::Man has is that I want to add double
quotes around literal text if and only if I'm rendering with nroff.  With
troff, the font change is sufficient and I don't want to add quotes.  The
simplest way to do that normally is with a string that's defined to either
the empty string or the quote mark depending on whether rendering is with
nroff or troff, but this causes no end of hassles when it's inside macro
arguments, not to mention the need to work around Solaris bugs with font
changes.

I'm fairly sure there's some better way of handling this than what I'm
currently doing, but my brain has not managed to come up with it yet.

> Whither this antipathy for the neutral apostrophe?

This has been an interesting long-term struggle.  It was the GNU coding
style for years to use `' as matched quotes.  I think they've finally
switched to Unicode quotes instead.  Technically, of course, the English
apostrophe isn't neutral; it's curved to the left.  But the ASCII
character is used and abused for a bunch of different things that aren't
really apostrophes.

> With the last proprietary Unixes finally retiring to their coffins or at
> least throwing in the towel on any delusions of troff maintenance, maybe
> people will take up some of these conveniences at last.

Speaking as someone maintaining a generator, it's very difficult to know
when I can drop support for old Unixes.  It's also very painful to be
wrong; if I delete a bunch of compatibility code, and then later someone
really wants it back, adding it back in is awful.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: man(7), hyphen, and minus

2022-12-23 Thread Russ Allbery
"G. Branden Robinson"  writes:

> Right.  Four or five years ago I proposed a new groff special character
> identifier `\(hm` to cover this case.  But this was not met with assent,
> and I concede that the problem may be confined to man pages.

I've been curious: how much use do you see of groff outside of man pages?
I dropped a bunch of troff formating guesswork and magic from Pod::Man
that had caused no ends of maintenance problems because I have seen little
evidence of anyone using it for something other than man pages, and in my
day job supporting research science I don't hear anyone talking about roff
in that context either.  (Everyting is LaTeX.)

I've therefore started optimizing Pod::Man for manual pages, although if
anyone reports problems with any other use I try to keep it working.

As a related question, are there grand plans for adding more Unicode
support?  I noticed that, for example, troff from groff as installed on
Debian appeared to have fairly rudimentary Unicode font support.  It
looked like the default font was missing a bunch of characters, it didn't
handle combining accent marks when I tried, etc.  It's possible that I was
testing incorrectly, though.

> I also see the wisdom in Werner Lemberg's decision years ago to close
> groff's predefined special character identifier name space to any
> expansion without damn good reason.

Yeah, at this point I would recommend everyone switch to Unicode and try
to support it as well as possible, although that doesn't help with cases
where the debate is over how to render pre-existing ASCII characters.

> The EOLing of Solaris troff is fat with the promise of opportunity.
> It's my hope that Illumos won't need much of a nudge to jump to groff,
> Heirloom Doctools, or neatroff, any of which would be an improvement
> because they're _maintained_.

> (Well, Heirloom has slowed _way_ down...[2])

I had not heard of Heirloom Doctools or neatroff before, although I don't
follow this field very closely.  Do you know if any platform uses them for
man pages right now?  The two implementations I mostly target are groff
and mandoc, since that seems to cover the vast majority of modern systems
and the remainders are using some legacy UNIX code base that basicallly
doesn't exist outside of that UNIX.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: man(7), hyphen, and minus

2022-12-23 Thread Russ Allbery
"G. Branden Robinson"  writes:

>For the (neutral) double quote, you have recourse to an obscure
> syntactical feature of AT&T 'troff'.  Because a double quote can begin a
> macro argument, the formatter keeps track of whether the current
> argument was started thus, and doesn't require a space after the double
> quote that ends it.(2)  (*note Calling Macros-Footnote-2::) In the
> argument list to a macro, a double quote that _isn't_ preceded by a
> space _doesn't_ start a macro argument.  If not preceded by a double
> quote that began an argument, this double quote becomes part of the
> argument.  Futhermore, within a quoted argument, a pair of adjacent
> double quotes becomes a literal double quote.

Incidentally, the rules for the second argument to .ds appear to not
follow the normal rules for macro arguments.

.ds C` ""

defines \*(C` to a single double-quote, but:

.ds C` """"

defines \*(C` to """, not to " as one might expect if that were
interpreted as a quoted argument and then adjacent doublequotes become a
literal double quote.

So far as I can tell, the correct rule for escaping the second argument to
.ds is that you should double any *leading* double quote, but leave all
the other double quotes alone.

.ds C` "

appears to define \*(C` to the empty string.  (I'm not sure where this all
might be documented.)

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: man(7), hyphen, and minus

2022-12-24 Thread Russ Allbery
"G. Branden Robinson"  writes:
> At 2022-12-23T10:03:13-0800, Russ Allbery wrote:

>> Yeah, the difficulty lies mostly in the layering, because people can
>> write POD source that is nonsensical in a man page context but that I
>> still have to do something with.  Stuff like C<<< B<< L >> >>>.

> The *roff language does not maintain a stack of typeface changes.  How
> radical a change to POD would it be to reject constructions like the
> above?

Basically impossible, alas.  It's been part of the language since the
beginning and is supported by all the other output formatters (and these
days, people arguably use the HTML output more than the man page output).

If I were designing POD again from scratch, I would do various things
differently, but at this point it is what it is.  A refrain that I'm sure
sounds familiar when maintaining groff.  :)

>> The additional problem that Pod::Man has is that I want to add double
>> quotes around literal text if and only if I'm rendering with nroff.
>> With troff, the font change is sufficient and I don't want to add
>> quotes.

> A lot of man pages use bold for literals, even on terminal devices.  I
> tend to in groff's own pages, but I _also_ quote multi-word or
> potentially ambiguous literals in case the man page is viewed in a
> context that strips the typeface (like copying and pasting into an
> email).

Yeah, Pod::Man has a whole bunch of special rules about what to quote and
what not to quote (which are now configurable as of the latest release),
mostly designed for documenting Perl.

Part of the challenge of maintaining Pod::Man is that Tom Christiansen
wrote the original intentionally to be magical: it was supposed to
understand what you were trying to do and add markup for you as much as
possible so that you didn't have to.  This was way back in the day when
that sort of thing was briefly in vogue.  Now, I think all of us have
learned that explicit is better.  :)

> Sort of.  I'd say more that it finally acknowledged the existence of ISO
> 8859 (free ECMA-94 copy here[2]).  So at long last they advise people to
> simply use ' and ", each paired with themselves.[3]

Oh, interesting.  My recollection is that GCC switched over to Unicode
quotes, so it sounds like there's some complexity here.  Or it may just be
that ' and " are the right choice if you don't already have a whole
translation layer in place so that you can downgrade Unicode quotes to
something else if you don't think you're in a Unicode environment.

>> Speaking as someone maintaining a generator, it's very difficult to
>> know when I can drop support for old Unixes.  It's also very painful to
>> be wrong; if I delete a bunch of compatibility code, and then later
>> someone really wants it back, adding it back in is awful.

> Does that mean you're not hopeful that you will be dropping support for
> Solaris troff soon after Oracle does?

I truly don't know.  I will at least seriously consider it, because
supporting it is sufficiently painful that I'd love to stop, although
discovering that groff has a variation of the same problem (\fBfoo\f(CWbar
shows "bar" in bold) means that I gain less than I would have hoped for by
removing the Solaris compatibility code.  Although that problem goes away
if I can safely use \f(CR instead of \f(CW, it looks like; I'm not sure
how portable that is, but that may be the right direction.

(The Solaris problem is that \fB\fP\f(CW\fP leaves the font set to B.  In
both cases, the problem seems to be that CW is not a "real" font.)

> I learned the following from Paul Eggert on this list just last
> month.[4]

> PE> Solaris 10 is no longer supported after January 2024, so if it and
> PE> all the other traditional troffs die out by 2024 we can stop
> PE> worrying about this then.
> PE>
> PE> Solaris 11.4, the only Oracle Solaris version that is planned to be
> PE> supported after January 2024, is based on groff 1.22.3 instead of on
> PE> traditional troff. See:
> PE>
> PE> https://docs.oracle.com/cd/E88353_01/html/E37839/troff-1.html
> PE> https://www.illumos.org/issues/12692

> This could buy you a lot of elbow room.

The thing about Pod::Man is that it's part of Perl core, so I try really
hard to be as portable as Perl is because I don't want my software to be
the thing that breaks on some obscure supported platform.  And Perl is
*really* portable (see, for instance, all the EBCDIC handling in Pod::Man)
and I have no idea if they're going to be willing to drop Solaris support
just because it's end of life.

I know Paul uses this rule for all of the software that he maintains, and
it makes a lot of sense, but I

Re: man(7), hyphen, and minus

2022-12-24 Thread Russ Allbery
"G. Branden Robinson"  writes:

> These are all correct statements.

> There are two major points to make here.

> 1.  Request invocations are not macro calls.  So all that stuff about
> double quotes we were talking about doesn't apply here.  :(
> Sorry about that.  That was a language design decision made
> literally before I was born.

> 2.  `ds` and related requests (like `as`) are unusual even among
> requests in that they treat the entire remainder of the input line
> as a single argument.

>> (I'm not sure where this all might be documented.)

> Would it surprise you to learn that I've rewritten parts of groff's
> Texinfo manual to discuss these matters in detail?  ;-)

Ah, excellent, thank you.  Pod::Man 5.01 will fix some latent bugs in this
area (mostly only relevant if people configure unusual quote characters
for C<> text).

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: man(7), hyphen, and minus

2022-12-24 Thread Russ Allbery
"G. Branden Robinson"  writes:
> At 2022-12-23T12:49:15-0800, Russ Allbery wrote:

>> I've been curious: how much use do you see of groff outside of man
>> pages?

> Others have answered this but I would also point you to Ralph Corderoy's
> page on the subject.

> https://www.troff.org/pubs.html

> It hasn't been updated since about 2006, I think, which means it has
> missed a few publications since then, like _The Go Programming Language_
> and Kernighan's _UNIX: A History and Memoir_.

Thanks!  Happy to see the continuing usage!

I probably should have assumed.  One of the things that I've noticed over
and over about free software is that nothing new ever truly replaces
something old in a comprehensive sense.  I can think of very few programs
that truly no one is using any more, because once the source code is
available to keep them alive, someone will keep them alive.  It makes for
a rather interesting diversity of software (and other things; for
instance, I still use Usenet).

> The groff_man(7) page has long attempted to prescribe a reasonably
> portable, reduced subset of the roff language for use in man pages.
> mandoc maintainer Ingo Schwarze and I spent some time prior to groff
> 1.22.4's release hammering that out in further detail.

Oh, so I was going to mention: currently, Pod::Man rolls its own macros
for verbatim text:

.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..

This looks basically equivalent to .EX/.EE, so I thought about using those
macros (and defining my own if they're not available, at least until no
one is using older implementations that don't have them).  But the main
thing that .EX doesn't support that the long-standing Pod::Man behavior
does is the .ne invocation, which is used like this:

# Get a count of the number of lines before the first blank line, which
# we'll pass to .Vb as its parameter.  This tells *roff to keep that many
# lines together.  We don't want to tell *roff to keep huge blocks
# together.
my @lines = split (m{ \n }xms, $text);
my $unbroken = 0;
for my $line (@lines) {
last if $line =~ m{ \A \s* \z }xms;
$unbroken++;
}
if ($unbroken > 12) {
$unbroken = 10;
}

This logic is very long-standing and was designed for troff printing of a
manual page (and older nroff setups that still did pagination) to avoid
unnecessary page breaks in the middle of a verbatim block.  I'm not sure
how much this matters given how people use man pages these days, but I
hate to break it for no reason.  So I think I'd need to add an .ne line
after (before?) the .EE macro if I switched to it?

> It's called Pod::_Man_: why would people use it for anything that isn't
> a manual page?

Okay, fair.  :)  Although historically people sometimes did, and of course
once upon a time people would sometimes typeset the full manual for
something with troff.  That output probably isn't as nice as it used to,
since I have subsequently dropped a lot of the attempted magic that only
applied to troff output (replacing paired " quotes with `` '', adding
small caps to long strings of all capital letters, and things like that)
because they were all using scary regexes and occasionally broke things
and mangled things in weird ways, causing lots of maintenance issues.

> Yes.  But there are two problems to solve: (1) acceptance of Unicode
> (probably just UTF-8) input

I was pleasantly surprised at how well this just worked with the man-db
setup on a Debian system, although I think that may involve a fair amount
of preprocessing.

> It has been possible for many years (since well before groff 1.22.3) to
> specify any Unicode code point for output.

Just to provide additional detail for the record (and this is almost
certainly the sort of thing you mean by "acceptance of Unicode input")
here's the simple document I was using for some testing.

https://raw.githubusercontent.com/rra/podlators/main/t/data/man/encoding.utf8

% groff -man -Tpdf -k encoding.utf8 > encoding.pdf
troff: encoding.utf8:72: warning: can't find special character 'u0308'
troff: encoding.utf8:74: warning: can't find special character 'u1F600'

u1F600 is presumably a problem with the output font, but u0308 is a
combining accent mark that groff does definitely support, just not as a
separate character.  (Without preconv, one instead gets mojibake, as I
expected.)

My theory was that combining accent marks pose a bit of an interesting
issue for groff because groff probably shouldn't think of them as a
separate output character that can be mapped in an output font, but
instead needs to essentially transform them into something like
\[u0069_0308] during the input processing.  (This may therefore
essenti