Hi! While using grohtml i've noticed that long texts in the section heading produced word spacing problems. As far as I understood the problem (please, correct me if I misunderstood something), because grohtml firstly produces a PS document, which is used to derive the final HTML document, long texts in a section heading would make groff's filling mechanism generate line breaks along the heading, so far, so good.
The expected behaviour was that, when post-grohtml reads groff's intermediate output, it should substitute those line breaks by spaces and produce the corresponding heading content string to be included inside the HTML <h"n"> tag (where "n' is the heading level). The problem is that when post-grohtml finds the line breaks in the intermediate output it doesn't produce any character, which results in no spacing between the last word of a line and the first word of the next line (they will be joined in the final HTML document). As far as I explored, this only occurs inside section headings (delimited by devtag:.NH and devtag:.eo.h in the intermediate output). A similar situation also occurs in the document title, but the presence of line breaks in it are handled without a problem by grohtml. At first, I thought it was a problem with the way the line breaks were indicated in the intermediate output. I've noticed that they are presented differently inside the title and inside the section heading. In the title (delimited by devtag:.tl and devtag:.eo.tl), a line break is denoted by: tlast_word_of_line_1 n40 0 V560 H600 x X devtag:.ce 9996 x X devtag:.eol x X devtag:.br V560 H600 tfirst_word_of_line_2 while in a section heading the line break is in the form below, without device tags: tlast_word_of_line_1 n40 0 V800 H0 tfirst_word_of_line_2 Inside a section heading the device tags devtag:.eol and x X devtag:.br aren't used to indicate the line break, so I thought this was related to the problem. It turned out not to be related, but I still wonder why there is this difference between the breaks in the title and in the section heading... Exploring the post-html source code (post-html.cpp), I found the cause of the problem and was able to fix it, although I'm not happy with my degree of understanding of it. Inside the function do_heading() of line 2598, there is an if statement in line 2631 which checks if the variable horiz is smaller than g->minh. If it's true, then a space character is added to the header buffer. As I understand (please correct me), this condition should've been true when there is a new line in the heading, but this isn't happening, which explains the missing space between the words. If I force the conditional to be always true, then the behaviour is corrected and everything works as expected. I don't have enough knowledge of the source code to fully understand the logic behind the processing of the section heading, but I would guess that maybe the conditional shouldn't exist, and the space character should always be added at that point of the code, or there is a missing update of the variable horiz at some point before the conditional, which prevents the conditional to work the way it was intended. So far, everything is working fine after this fix, but I would be glad if someone more familiar with the source code could shed some light here, because I don't completely understand what I did. Another less intrusive way to fix this problem, which is more a bypass than a proper fix, is to disable the filling of the section heading text, by doing: .AM SH .nf .. .AM NH .nf .. which works fine because filling is re-enabled when groff encounters a MS paragraph macro. This prevents groff from breaking the lines inside the section heading, so the problem won't occur, because post-grohtml will see a single line heading no matter how long it is. Moreover, there is no problem if the section heading extends further than the page width of the PS file in the absence of filling, because grohtml will still be able to read it (and the PS file isn't meant to be seen by a human). The only problem I see in this method is that simple line breaks in the text document will have semantic meaning inside the section heading in the absence of filling, which should be kept in mind while composing the document, as the output lines will now be broken where the input lines are. I would like to read your thoughts about the described behaviour. Was it a known problem? Are my fixes valid ones? Best regards. Daniel