Re: Multiline @cindex entries misrender in groff Texinfo manual

2024-06-03 Thread Patrice Dumas
On Sun, Jun 02, 2024 at 11:58:46PM +0100, Gavin Smith wrote:
> I agree that it is undefined behaviour.  makeinfo could flag it as an
> error.  It may not be possible to modify texinfo.tex to stop reading
> the line at the end of the first line for input line
> 
> @cindex aaa@
> bbb
> 
> Perhaps a warning like this (of course the same change would have to
> be made to the XS parser):

Looks good to me.

> 
> diff --git a/tp/Texinfo/ParserNonXS.pm b/tp/Texinfo/ParserNonXS.pm
> index d39b0b14bd..a6bfab8014 100644
> --- a/tp/Texinfo/ParserNonXS.pm
> +++ b/tp/Texinfo/ParserNonXS.pm
> @@ -5428,6 +5428,12 @@ sub _handle_other_command($)
>$command), $source_info);
>}
>if ($command eq "\n") {
> +if ($self->_top_context() eq 'ct_line') {
> +  $self->_line_warn(
> +"\@ should not occur at end of argument to line command",
> +$source_info);
> +}
> +
>  $current = _end_line($self, $current, $source_info);
>  $retval = $GET_A_NEW_LINE;
>}



How to configure how groff hyphenates man pages (was: tctest.1 man page hyphenation comments)

2024-06-03 Thread G. Branden Robinson
[looping in groff list as this is something of a FAQ]

Hi Thomas,

At 2024-06-03T19:38:25-0400, Thomas Dickey wrote:
> Here's what I see with the 1.8 revision:
> 
> DESCRIPTION
>tctest  exercises  the  termcap  library (or emulation of termcap) with
>which it is linked.  It provides several command-line  options,  making
>it  simple  to construct test-cases to compare implementations of term-
>cap.
> 
> Call that overly-aggressive, then: it's predictable but reduces readability 
> :-)

Okay.  _Personally_, I think that "term-cap" is a reasonable hyphenation
break point.  In linguistic terms, it's both a morpheme boundary and a
syllabification point.

> I was probably also grumbling about nroff hyphenating "error" and "Repeat",
> i.e., 
>   "er-" "ror"
>   "Re-" "peat"
> It also split
>   "parameters" as "pa-" "rameters"
>   "obsolete" as "ob-" "solete"
>   "default" as "de-" "fault"

Yes.  The default hyphenation mode (even for English) is pretty
aggressive.  They're TeX's hyphenation patterns; we just live with them.
;-)

But you, the reader of a man page, do not have to; see below.

> In a quick check, it hyphenated 11 lines out of the 84 non-blank lines,
> and of those 11, 6 have 2 characters before the hyphen.  8 of the 11
> lines do have at least one place where there's a double-space.
> 
> Preventing it from splitting termcap reduced that to 10 lines.
> 
> (I'd rather the feature was configurable so that I could force it to
> keep at least 3 characters before/after the split)

It is, and a there a few methods of doing so, depending on how much
control you want to exercise.

groff_man(7):

 -rHY=0   Disable automatic hyphenation.  Normally, it is
  enabled (1).  The hyphenation mode is determined by the
  groff locale; see section “Localization“ of groff(7).

That is the most popular approach.  It's groff-specific, but causes no
harm elsewhere (it merely won't work; defining a register that some
other man(7) macro package pays no attention to damages nothing), and
has been in groff for a long time.

Authors
 The initial GNU implementation of the man macro package was written
 by James Clark.  Later, Werner Lemberg ⟨w...@gnu.org⟩ supplied the S,
 LT, and cR registers, the last a 4.3BSD‐Reno mdoc(7) feature.
 Larry Kollar ⟨kol...@alltel.net⟩ added the FT, HY, and SN
 registers; the HF string; and the PT and BT macros.

The `HY` feature dates back to 2003, and was included in groff 1.19
(April 2003).  This is so old that even old Mac OS X has it (before they
got rid of groff altogether in macOS Ventura).

What if you want finer-grained control over hyphenation?  For example,
what if you want hyphenation mode 8 instead of 4?  (And it sounds like
you, personally, do.)

groff(7):

Hyphenation
 When filling, groff hyphenates words as needed at user‐specified
 and automatically determined hyphenation points.
...

 Several requests influence automatic hyphenation.  Because
 conventions vary, a variety of hyphenation modes is available to
 the .hy request; these determine whether hyphenation will apply to
 a word prior to breaking a line at the end of a page (more or less;
 see below for details), and at which positions within that word
 automatically determined hyphenation points are permissible.
...

 8  disables hyphenation after the first two characters of a
word.

This is an AT&T troff-compatible feature.  So, you can just put this in
your man.local file.  The groff_man(7) page's "Files" section documents
where this dwells.  On Debian systems, it's /etc/groff/man.local.

.hy 8

The foregoing approach sometimes gets overridden by man page documents
that attempt to seize control of hyphenation themselves, and do it
wrongly.  An approach that we managed to purge ncurses's man pages of
back in October was this.

.\" Text formatted with(out) hyphenation as configured by user.
.nh
.\" page content with hyphenation off
.hy
.\" page content with hyphenation ON, using mode 1,
.\" which is wrong for English,
.\" and enables automatic hyphenation
.\" even if the user doesn't want it at all.

That's why I submitted patches to take that stuff out.  It's nothing but
trouble.  In the forthcoming groff 1.24, a new feature permits GNU troff
to do something sane even in the face of the above--but it's an
extension, so should not stop anyone from ripping the foregoing bad
pattern out of their man page documents.[1]

If you wanted to be really scrupulous, and/or are in the habit of
reading man pages in multiple languages (but still don't altogether hate
automatic hyphenation), then you really want to do the foregoing only if
the man page is in English.  As of groff 1.23, you can guard the request
with a conditional that queries what the "groff locale" is.

.if '\\*[locale]'english' .hy 8

If you want more finely grained control, GNU troff offers numerous
extension requests to