Follow-up Comment #29, bug #64360 (project groff): [comment #27 comment #27:] > First I'd like to try to reduce the scope of this discussion, since it seems to have grown in multiple directions.
Sure. > Am I correct in the assumption that the grout files for any given input would not be identical when produced by different roff implementations? Yes. Here's some simple input given to Heirloom Doctools _troff_ and then GNU _troff_. $ printf -- '.nf\na b\n-\\-\n' | ./bin/troff -Tps x T ps x res 72000 1 1 x init V0 p1 x font 1 R /home/branden/heirloom/lib/doctools/font/devps/R.afm 4 x font 2 I /home/branden/heirloom/lib/doctools/font/devps/I.afm 4 x font 3 B /home/branden/heirloom/lib/doctools/font/devps/B.afm 4 x font 4 BI /home/branden/heirloom/lib/doctools/font/devps/BI.afm 4 x font 5 CW /home/branden/heirloom/lib/doctools/font/devps/CW.afm 4 x font 6 H /home/branden/heirloom/lib/doctools/font/devps/H.afm 4 x font 7 HB /home/branden/heirloom/lib/doctools/font/devps/HB.afm 4 x font 8 HX /home/branden/heirloom/lib/doctools/font/devps/HX.afm 4 x font 9 S1 /home/branden/heirloom/lib/doctools/font/devps/S1.afm 516 x font 10 S /home/branden/heirloom/lib/doctools/font/devps/S.afm 1028 s10 f1 x X LC_CTYPE en_US.UTF-8 H72000 V12000 ca wh7770cb n12000 0 H72000 V24000 c- h3330C\- n12000 0 x trailer V792000 x stop $ printf -- '.nf\na b\n-\\-\n' | troff -Tps x T ps x res 72000 1 1 x init p1 x font 5 TR f5 s10000 V12000 H72000 md DFd ta wh2500 tb n12000 0 V24000 H72000 t- C\- h5640 n12000 0 x trailer V792000 x stop In the Heirloom output, I find the line wh7770cb noteworthy for a reason I'll return to. > I assume this is true, else the output drivers from different implementations would be interchangeable. I don't believe they are. I have a _feeling_ that Kernighan might have been reaching for this, but didn't quite nail down the syntax tightly enough to strictly permit it. However, I could be retrojecting my thoughts in 2023 onto his in 1980, when (a) there was no other device-independent troff implementation, (b) one was not going to appear for nearly a decade, and (c) given the challenges he described in CSTR #97, he might have thought it unlikely that another would ever be created. As further speculation, (d) in 1980 the eventual divestiture of the AT&T monopoly was not yet seen as inevitable, and as long as that hope held, there wasn't much need for anyone to reimplement device-independent troff since it could be had for low or zero cost. If anyone held that belief, it was the most swiftly overturned of these. > This means, whilst they may all be based on the information in cstr#54, each roff has developed its own private API between the formatter and output drivers. For this reason the decision on whether this is a change to the groff version of the API, has to be confined to what is contained in the groff documentation. Or we can accept that _groff_'s documentation doesn't adequately describe its implementation, which I believe I just demonstrated in bug #63544. > Empirical observation shows that groff uses a simple rule of one operation per line No. Glyph output and horizontal motions are frequently mixed when the `tcommand` directive is not present. If we remove it from font/devps/DESC, we get this. $ printf -- '.nf\na b\n-\\-\n' | ./build/test-groff -T ps -Z | grep cawh cawh6940 And it is ubiquitous when the "obsolete" (to use Bernd's term) output command is used. (I would term it "legacy".) $ printf -- '.nf\na b\n-\\-\n' | groff -T X100 -Z | grep caw caw10bh7 Further, (recalling the point I promised to return to above) Heirloom breaks this rule in yet another respect, putting a `c` command after an `h` command, whose argument is _not_ fixed-width (the integers it uses are not zero-padded on the left) with no separation at all. > and using a single space to avoid a "clashing between the command code and the arguments without the space.", even though Kernighan states that it is permitted to use newlines as well as a space and tab for this purpose, none of our drivers support this. I believe I have empirically refuted this claim in bug #63544, comment #3. > The w command is not an operation, it is just a marker for a paddable word space so following with a newline is against our own documentation:- I find it unnecessary to bifurcate the class of "output commands" into "operations" and "non-operations". It certainly isn't required to explain present (and, I have to guess, decades-long outstanding) behavior of _libdriver_. > in 'gtroff''s intermediate output, every command with > at least one argument is followed by a line break, This is demonstrably false, as I showed above. > thus providing > excellent readability. ...and that's an unnecessary sales pitch. > The w command has no arguments so under this rule it should not have a following new line. I think you have extrapolated an invalid rule from vague and inaccurate documentation. > As regards having a space after the w command, our documentation says:- > The 'gtroff' output parser, however, is smart about whitespace by making it > maximally optional. > Which I take to mean it only uses a space to avoid the "clashes" mentioned above, This is another sales pitch, and vague besides. > and it further says:- > Commands and arguments with a known, fixed length need > not be separated by syntactical space. It does say that. Unfortunately this claim contradicts one you already quoted. > in 'gtroff''s intermediate output, every command with > at least one argument is followed by a line break, According to the above, does a command with at one fixed-length argument get followed by a line break or not? I don't think that question is answerable without relying on an additional information channel, like reading the source code or experimenting. You may be beginning to see why I am critical of this documentation. > The w command is fixed length, so to satisfy "maximally optional" no space is used. As noted previously, I cannot elicit any clear semantics from the modifier "maximally" here. > I never said that white space cannot follow a w command, but if we change to include white space after it then that goes against how the groff version of the API has been documented for many years. That, I agree with. Our documentation in this area is inaccurate. Please understand that I am exercising restraint by not saying more. > I believe the change Branden has on his private branch is to output a new line after a w command, but this bug concerns white space after the w. Contrary to cstr#54, which classes space/tab/newline as the same, groff does not allow newline to be used as the white space between a command and its arguments (this difference is not documented). That's a good interesting point and one I want to explore with further testing. I see no reason _groff_ output drivers _shouldn't_ accept a newline thus, given the clarity of CSTR #54's wording on the subject. > If groff 1.23++ is going to use w followed by a new line, none of the proposed patches is optimal. A loop is no longer required, since no further commands will be on that line. There is no point in producing code to cater for situations which will not arise. Gropdf is written to parse grout output from groff, if that output is altered so that it no longer complies with our own documentation and gropdf fails to handle it then it is not a bug, but a change in the API and should involve a change request and at least a wider discussion than just us three. I propose for _gropdf_ to accept the same inputs _grops_ does and to interpret them in a compatible way. That is all. > Apparently, the reason for wanting to make this change is to "generate "grout" that is more easily lexically analyzed". Citing posix shell's poor lexical processing capabilities, I don't see what difference wh2500 and w\nh2500 makes. If that's what you want, would a simple filter like this help:- > [derij@pip busgrap]$ perl -pe 's/(.)(.*)/$1\n$2/ if m/^w/; s/^(.)(\S.*)/$1 $2/mg' zfile > x T ps > x res 72000 1 1 > x init > p 1 > x font 5 TR > f 5 > s 10000 > V 12000 > H 72000 > m d > D Fd > t Deri > w > h 2500 > D l 100000 0 > n 12000 0 > x trailer > V 792000 > x stop I don't think writing such a tool is desirable. It "feels" too small to be a shippable tool--not supportive of its own weight in terms of making it a proper command with `--help` and `--version` and a man page, or the effort of coming up with a good name for it; too long to expect anyone to type it; and too obscure for it to make it into many people's shell start-up files as a function. In my opinion, GNU _troff_ should simply produce output that is easy for humans to read in the first place. > In fact I'd be very happy to write a proper grout tool with multiple output options (pretty print, markup, XML). Markup could look like:- > x T ps # for grops > x res 72000 1 1 > x init > p1 # Page 1 > x font 5 TR > f5 # Times-Roman > s10000 # ps 10 > V12000 # V 1/6th in > H72000 # H 1in > md # Default text colour: Black > DFd # Default fill colour: Black > tDeri > wh2500 # Word Space: 2.5p > Dl 100000 0 # Line from x,y to x1,y1 > n12000 0 # New Line > x trailer > V792000 > x stop > Using some of the code from gropdf which keeps track of current position, XML output could tag the x,y position of every element. That would indeed be useful--a "grout annotator" if you will. But I think it's outside the scope of this ticket. > So it is unnecessary to change the format of grout to achieve what you say you want. As noted above and in bug #63544, I don't have to. We just need to correct the documentation and align _gropdf_ with the other output drivers here. > The danger in changing the current grout format is we do not know what tools have been written which parse our current grout format, didn't someone write a parser which output html/javascript, how do we know our changes won't affect them. If they accept what _grops_, _grotty_, _grodvi_, _grohtml_, and so forth do, then they'll be fine. They do risk disruption if they relied upon our badly composed documentation in this area. If I don't treat CSTR #54 as scripture as some of our mailing list subscribers do, I'm surely not going to bring a higher level of reverence to our own, particularly where it has demonstrable problems. > Given that there are non-intrusive methods to achieve the result you want, I hope your hankerings can be satisfactorily assuaged. I have to reject your conclusion here as ill-premised. Fortunately, I see no need to alter _libdriver_ in any way (pending the "newlines everywhere" research). My tasks are to (1) revise our erroneous and unclear documentation and (2) assemble a patch for _gropdf_ that you're willing to accept, assuming you lack the time or desire to do so yourself. > AOB > > Many people have praised Branden for his contributions to the documentation, I don't think you need two hands to count them. :P > as I do, it just felt wrong to see open criticism of a fellow contributers use of english. I am more than happy for Branden to make our documentation more "pellucid", but I think it is nicer to do it without denigrating previous efforts which were made with the best intentions. I have tried (I do not claim always successfully) to critique the _code_, not the person. As I understand it, this is an aspect of [https://en.wikipedia.org/wiki/Egoless_programming egoless programming]. You may have observed Alex Colomar expressing a pretty low opinion of the `is_family_valid()` function on the _groff_ list recently. He may not have known at the time that I had written it. While I was taken aback at first, I did not get upset with him, either on the mailing list or privately. On the contrary, I largely agreed with his assessment; the code's form arose from an unfortunate constraint problem we have with our decision (which I guess I'm the main person driving) to stick to ISO C++98. The same goes for documentation. We all put a bit of ourselves into our written words, but the text is not the author. Ingo Schwarze is another _groff_ contributor who pulls no punches when expressing opinions of code. But I don't remember seeing him engage in personal attacks. I expect the same latitude when evaluating documentation, but I am also prepared to endure criticism of my own product. All 3 of us can likely remember Ralph Corderoy's withering assessments of my emails (particularly their length). He seemed to have difficulty believing that I could write concisely. Whether he ever actually read any of the documentation I have written for _groff_ (when it wasn't pitched to the mailing list first--a tiny proportion), I don't know--he never offered any evidence of having done so. Ralph's unrelentingly negative attitude about _groff_ and my work on it (in contrast to other *roffs) irritated me but I didn't, and don't, let that stop me from considering and crediting such contributions as he makes. In other words, I can work with him. If Bernd should return, I would expect to be able to work with him, too, and I hope he'd reciprocate that. > The latest incarnation of gropdf (in the deri-gropdf-ng git branch, give it a go :-)) is now 80 lines short of 5000 lines. So far your work is getting rave reviews. I'm envious! :D _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?64360> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/