Hi!

While using grohtml i've noticed that long texts in the section heading
produced word spacing problems. As far as I understood the problem (please,
correct me if I misunderstood something), because grohtml firstly produces
a PS document, which is used to derive the final HTML document, long texts
in a section heading would make groff's filling mechanism generate line
breaks along the heading, so far, so good.

The expected behaviour was that, when post-grohtml reads groff's
intermediate output, it should substitute those line breaks by spaces and
produce the corresponding heading content string to be included inside the
HTML <h"n"> tag (where "n' is the heading level).

The problem is that when post-grohtml finds the line breaks in the
intermediate output it doesn't produce any character, which results in no
spacing between the last word of a line and the first word of the next line
(they will be joined in the final HTML document). As far as I explored,
this only occurs inside section headings (delimited by devtag:.NH and
devtag:.eo.h in the intermediate output).

A similar situation also occurs in the document title, but the presence of
line breaks in it are handled without a problem by grohtml.

At first, I thought it was a problem with the way the line breaks were
indicated in the intermediate output. I've noticed that they are presented
differently inside the title and inside the section heading. In the title
(delimited by devtag:.tl and devtag:.eo.tl), a line break is denoted by:

tlast_word_of_line_1
n40 0
V560
H600
x X devtag:.ce 9996
x X devtag:.eol
x X devtag:.br
V560
H600
tfirst_word_of_line_2

while in a section heading the line break is in the form below, without
device tags:

tlast_word_of_line_1
n40 0
V800
H0
tfirst_word_of_line_2

Inside a section heading the device tags devtag:.eol and x X devtag:.br aren't
used to indicate the line break, so I thought this was related to the
problem. It turned out not to be related, but I still wonder why there is
this difference between the breaks in the title and in the section
heading...

Exploring the post-html source code (post-html.cpp), I found the cause of
the problem and was able to fix it, although I'm not happy with my degree
of understanding of it. Inside the function do_heading() of line 2598,
there is an if statement in line 2631 which checks if the variable horiz is
smaller than g->minh. If it's true, then a space character is added to the
header buffer. As I understand (please correct me), this condition
should've been true when there is a new line in the heading, but this isn't
happening, which explains the missing space between the words. If I force
the conditional to be always true, then the behaviour is corrected and
everything works as expected.

I don't have enough knowledge of the source code to fully understand the
logic behind the processing of the section heading, but I would guess that
maybe the conditional shouldn't exist, and the space character should
always be added at that point of the code, or there is a missing update of
the variable horiz at some point before the conditional, which prevents the
conditional to work the way it was intended.

So far, everything is working fine after this fix, but I would be glad if
someone more familiar with the source code could shed some light here,
because I don't completely understand what I did.

Another less intrusive way to fix this problem, which is more a bypass than
a proper fix, is to disable the filling of the section heading text, by
doing:

.AM SH
.nf
..

.AM NH
.nf
..

which works fine because filling is re-enabled when groff encounters a MS
paragraph macro. This prevents groff from breaking the lines inside the
section heading, so the problem won't occur, because post-grohtml will see
a single line heading no matter how long it is. Moreover, there is no
problem if the section heading extends further than the page width of the
PS file in the absence of filling, because grohtml will still be able to
read it (and the PS file isn't meant to be seen by a human). The only
problem I see in this method is that simple line breaks in the text
document will have semantic meaning inside the section heading in the
absence of filling, which should be kept in mind while composing the
document, as the output lines will now be broken where the input lines are.

I would like to read your thoughts about the described behaviour. Was it a
known problem? Are my fixes valid ones?

Best regards.

Daniel

Reply via email to