Hi Branden,

On Wed Nov 13, 2024 at 4:36 PM CET, G. Branden Robinson wrote:
> [...]
> > Latin1 characters continue working even when loading Latin2 as long as
> > they are specified as the respective UTF-8 codes.
>
> And they _should_ continue to hyphenate at appropriate locations because
> s set of hyphenation codes is associated with the hyphenation
> _language code_ ("en", "cs", "fr", etc.), which can change from
> environment to environment.
> [...]

Ah yeah, I forgot about the functionality of .hla. I wonder if this
works even when each language uses a different encoding though, given
that .tr and .trin are specified like so in groff(7):
  .tr abcd...
      Translate ordinary or special characters a to b, c to d, and
      so on prior to output.

   .trin abcd...
      As .tr, except that .asciify ignores the translation when a
      diversion is interpolated.

i.e. translation should happen on output, not on input, meaning that
using .hla might not be sufficient to switch between cs and fr, because
that doesn't switch the encoding used.

That's just my thoughts based on the documentation, though.
I don't have the time to verify this.

> > My conclusion is that, given the intricacies of all this, loading the
> > appropriate localization file is THE way to setup hyphenation
> > correctly.
>
> Yes!  Our documentation does actually try to get this idea across.  If
> there are spots where you feel it is failing to do so, please bring them
> to my attention (but also base your recommendations on groff Git--I
> revise documentation all the time).

groff(7) does mention it, but it's among the last things mentioned in
the Hyphenation section. The texinfo manual doesn't mention it at all
in its section 5.1.3 about Hyphenation where I would expect it.
(At least the online version -- I haven't found any git source
for it, just tarballs.)

> > I feel like splitting the hyphenation part of localization files off
> > (into hycs.tmac etc.) would be beneficial in that one could load the
> > hyphenation settings for a given language without all the localization
> > strings.
>
> This, I'm less sure about.  The localization strings are namespaced, so
> the only real advantage to separating them is a minuscule reduction in
> formatter startup time, about which I have never read any complaints.
> [...]
>
> > Groff's documentation of hyphenation could then be updated
> > with a simple mention of
> >   .mso hycs.tmac
> > before specifying the technical details (.hy, .hla, .hpf, ...) which
> > ordinary users won't need to deal with.
>
> The existing recommendation for localization is to specify loading of
> the groff locale via the command-line `-m` option, _after_ loading any
> full-service package.
> [...]

The reason I was suggesting this is the fact that once one disables
hyphenation through .nh or .hy 0, the only way to re-enable it, as far
as I am aware, is to issue .hy with the proper hyphenation mode, which
depends on the language and might not be known by the user.

Separating the hyphenation portion into its own macro file would allow
one to re-enable hyphenation by issuing .mso hyLANG.tmac instead of
having to research the appropriate mode for the given language.

A simple macro could then be constructed which would offer a
friendlier interface to hyphenation. It could work like this:
  .HY
      Return to previous hyphenation settings (if set).
  .HY 0
      Disable hyphenation.
  .HY LANG
      Set hyphenation parameters appropriate for language LANG.

This would allow usage like so:
  .HY cs
  Příliš žluťoučký kůň úpěl ďábelské ódy...
  .br
  .HY 0
  .na
  https://\:www.gnu.org/\:software/\:groff/\:manual/\:groff.html
  .br
  .ad
  .HY
Of course, this wouldn't be necessary if .hy worked like .ad,
but (unless I am mistaken again :) it doesn't and cannot due
to desired compatibility with AT&T troff.

~ onf

Reply via email to