Hi Oliver,

At 2025-05-14T20:40:19+0200, Oliver Corff via GNU roff typesetting
system discussion wrote:
> Thank you very, very much for your dedication and the profound insight
> you share with the groff user community.

Thank you!  I don't have enough insights to suit my own preferences.
I'd like to have more.  :)

> On my side, I am happy that my experiments brought a hidden (?) issue
> to the light of the day. It is interesting to see that, for heaven's
> sake, it is ms which stands out in stark contrast, if negatively, to
> the other macro packages. Yet, from an investigative point of view,
> even a failure can be very helpful indeed.

Very much so.

And I should note that there _is_ a workaround for the core dump: don't
load the ms package so late.

In other words you can do this:

$ groff -d paper=A5 -k -ms -T html /tmp/Territoriale_Ansprueche-GBR.ms >| 
/tmp/Territoriale_Ansprueche-GBR.html && echo it worked
troff:/tmp/Territoriale_Ansprueche-GBR.ms:551: warning [page 1, line 369]: 
cannot adjust line; overset by 13n
it worked

...where my changes to your document are simply this:

$ diff -U1 /tmp/Territoriale_Ansprueche.ms /tmp/Territoriale_Ansprueche-GBR.ms
--- /tmp/Territoriale_Ansprueche.ms     2025-05-14 13:36:13.105895478 -0500
+++ /tmp/Territoriale_Ansprueche-GBR.ms 2025-05-17 08:40:13.822639997 -0500
@@ -1,3 +1,3 @@
 .\" Compile with:
-.\" groff -d paper=A5 -P-pA5 -k Territoriale_Ansprueche.ms -Tpdf > TA.pdf
+.\" groff -d paper=A5 -P-pA5 -k Territoriale_Ansprueche.ms -ms -Tpdf > TA.pdf
 .\"
@@ -11,4 +11,4 @@
 .\"
-.mso s.tmac    \" Lade ms
-.mso de.tmac   \" Lokalisiere deutsch
+.\"mso s.tmac\"        Lade ms
+.mso de.tmac\" Lokalisiere deutsch
 .fam H

But the resulting document is still ill-formed; ms attempts to put the
entire text of the document into the first section heading, which it
hyperlinks.  Amusingly, the resulting gigantic <a href> element causes
problems for other software.

$ lynx /tmp/Territoriale_Ansprueche-GBR.html

A Fatal error has occurred in Lynx Ver. 2.9.0dev.6

Please notify your system administrator to confirm a bug, and
if confirmed, to notify the lynx-dev list.  Bug reports should
have concise descriptions of the command and/or URL which causes
the problem, the operating system name with version number, the
TCPIP implementation, and any other relevant information.

Do NOT mail the core file if one was generated.

Lynx now exiting with signal:  11

Aborted (core dumped)

> Anyway the document models of classical text vs. html (do HTML
> documents really have _pages_?) are fundamentally different.

Texinfo has an approach where you can generate one big HTML document
from the input, or a set of them (one per "node").  groff/grohtml
appears to do something similar; witness how it handles "pic.ms".

grohtml(1):
     -j output‐stem
            Instruct grohtml to split the HTML output into multiple
            files.  Output is written to a new file at each section
            heading (but see option -S below) named output‐stem-n.html.
[...]
     -S heading‐level
            When splitting HTML output (see option -j above), split at
            each nested heading level defined by heading‐level, or
            higher).  The default is 1.

> So for a while, I thought that the -Thtml option was not at all
> supposed to be used with any of the classical macro packages but
> should have its own set of requests and macros.

My opinion is that, as with PDF, we should be able to bolt extensions on
to the existing full-service macro packages to support HTML.

Two decisions that Gaius and Werner made with respect to grohtml that
I'm uneasy about, and to which I assign blame (based on an understanding
that I admit is only partial) for the tool never getting out of beta
status, are:

1.  Attempting to support generation of HTML from "raw" *roff input; and
2.  Attempting to craft a single auxiliary macro package ("www.tmac")
    that could be combined with any existing full-service package.

I think (1) was too ambitious and demanded too difficult an impedance
match between the *roff document model and the HTML one.  HTML really
expects to deal with an entire document at a time.  *roff deals more
with an output line at a time, with provision for building collections
of lines up into queues (diversions).  I think that is why the
Mulley/Lemberg approach demanded a "mini-troff state machine" (MTSM)
inside the formatter, which is complex, has a heavy code footprint, and
has been the source of multiple crasher bugs.

For example, I can reproduce the crash you had with your input document.

$ groff -b -ww -d paper=A5 -k -T html /tmp/Territoriale_Ansprueche-GBR.ms >| 
/tmp/Territoriale_Ansprueche-GBR.html
troff: backtrace: 's.tmac':1019: macro 'ds@auto-end'
troff: backtrace: 's.tmac':1328: macro 'par*start'
troff: backtrace: 's.tmac':1346: string 'PP'
troff: backtrace: 's.tmac':296: macro 'PP'
troff: backtrace: file '/tmp/Territoriale_Ansprueche-GBR.ms':18
troff:/tmp/Territoriale_Ansprueche-GBR.ms:18: warning: register '0:ds-type' not 
defined
[...much more misery from the late-loading of s.tmac...]
troff:/tmp/Territoriale_Ansprueche-GBR.ms:21: warning [page 1, line 2, 
diversion 'ds*div', line 1]: cannot adjust line; overset by 1n
groff: error: troff: Aborted (core dumped)
$ gdb ./build/troff ./core
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./build/troff...
[New LWP 50062]
Core was generated by `troff -ww -dpaper=A5 -dwww-image-template=grohtml-50055- 
-Thtml'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
##(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f91b97d9537 in __GI_abort () at abort.c:79
#2  0x00007f91b998e329 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#3  0x0000561e7c3f5f1b in environment::construct_format_state 
(this=0x561e87bea9d0, nd=0x561e87bc67e0, nd@entry=0x561e87bcc120,
    was_centered=was_centered@entry=true, filling=1) at 
../src/roff/troff/env.cpp:2507
#4  0x0000561e7c3f6e67 in environment::output (this=this@entry=0x561e87bea9d0, 
nd=nd@entry=0x561e87bcc120, suppress_filling=<optimized out>,
    vs=..., post_vs=..., width=..., width@entry=..., was_centered=true) at 
../src/roff/troff/env.cpp:173
#5  0x0000561e7c3fb20b in environment::output_line 
(this=this@entry=0x561e87bea9d0, nd=<optimized out>, width=width@entry=...,
    was_centered=was_centered@entry=true) at ../src/roff/troff/env.cpp:1963
#6  0x0000561e7c3fdb3f in environment::possibly_break_line 
(this=this@entry=0x561e87bea9d0, must_break_here=must_break_here@entry=false,
    must_adjust=false) at ../src/roff/troff/env.cpp:2347
#7  0x0000561e7c3fe530 in environment::space (this=this@entry=0x561e87bea9d0, 
space_width=..., sentence_space_width=...)
    at ../src/roff/troff/env.cpp:559
#8  0x0000561e7c3fe61e in environment::space (this=0x561e87bea9d0) at 
../src/roff/troff/env.cpp:529
#9  0x0000561e7c413e95 in process_input_stack () at 
../src/roff/troff/input.cpp:3283
#10 0x0000561e7c41c234 in process_input_file (name=<optimized out>, 
name@entry=0x561e7c45cce1 "-") at ../src/roff/troff/input.cpp:9104
#11 0x0000561e7c420184 in main (argc=5, argv=0x7fff110f6fa8) at 
../src/roff/troff/input.cpp:9460
##(gdb) up
#1  0x00007f78d9f9f537 in __GI_abort () at abort.c:79
79      abort.c: No such file or directory.
##(gdb) up
#2  0x00007f78da154329 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
##(gdb) up
#3  0x000055f28e5fff1b in environment::construct_format_state 
(this=0x55f2bbd4a2f0, nd=0x55f2bbd2c4e0, nd@entry=0x55f2bbd2ba40,
    was_centered=was_centered@entry=true, filling=1) at 
../src/roff/troff/env.cpp:2507
2507          nd->state->add_tag(MTSM_CE, centered_line_count + 1);

Yup, it's MTSM again.

Decision (2) is more defensible--in principle it's less work--but I
still come down against it.  Every macro package has its own idioms.
And as I see it there's not a whole lot that actually needs to be added
for hypertext support, which is the point of HTML.  You need a way to
(a) mark "anchor" spots in the text which can serve as the destinations
of hyperlinks and (b) a way to bracket a region of the document (usually
a run of text) as a hyperlink with a specified target.

We've done this for man(7), mdoc(7), and mom(7).  Deri's done it in some
unmerged work for ms(7) that he's asked me to look over.  Support in
mm(7) will probably look similar, as it can be described as an overgrown
ms(7).  That leaves me(7).

> *That* was the origin of my question, and that was why it did not even
> come to my mind that I should try a different macro package instead of
> ms. So it goes.
> 
> The last time I wrote that I managed to crash (groff: error: troff:
> Aborted (core dumped)) the compile run with the following command
> line:
> 
> $ groff -k myfile.ms -Thtml -w all > myfile.htm
> 
> with
> 
> .mso s.tmac
> 
> in the preamble of the document. (I am too lazy to build complex
> command lines even if it's just a cursor up movement that's necessary
> to invoke a complex command line again.)
> 
> During some experiments, I suddenly noticed that the crashes did not
> occur anymore. Since I had introduced multiple small changes in the
> course of my edit session, it took me a while to identify the culprit.
> 
> I had a title display which more or less went as follows:
> 
> .DS C
> .AU
> A. U. Thor
> .TL
> Opera Minora
> .DE
> 
> That display reliably crashed the compile run.

I've seen something similar, and I wouldn't be surprised if it were the
same issue.

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?id=0652060f9c7d119257b370fa25759ff07205049c#n2938

> However, if I replaced
> 
> .DS C
> 
> with
> 
> .CD
> 
> (same visual result when compiled properly, let's say with -Tpdf)
> 
> then the compile run with -Thtml was successful in the sense that it
> did not abort prematurely. Very interesting!

I suspect another problem with MTSM.  Maybe the same problem.

I'll take this opportunity to note that this week I found and fixed a
33-year-old bug in groff, the command, which had the enraging property
of making grohtml harder to debug.

Here's the story.  https://savannah.gnu.org/bugs/?67133

Regards,
Branden

Attachment: signature.asc
Description: PGP signature

Reply via email to