Follow-up Comment #32, bug #64484 (group groff): At 2024-11-17T11:10:18-0500, Deri James wrote: > Follow-up Comment #31, bug #64484 (group groff): > > [comment #30 comment #30:] >> At 2024-11-16T12:22:25-0500, Deri James wrote: >>> Follow-up Comment #29, bug #64484 (group groff): >>> >>> Branden leans towards rejecting things even though the users >>> intention is clear. For groff there are no spaces, only horizontal >>> movements, >> >> I reject this claim as false. >> >> It is rebuttable in at least 3 respects: > > Hi Branden, > > It is a bit odd that you rebut my claim "For groff there are no > spaces, only horizontal movements" by showing that the statement is in > fact correct!
I guess you can say that in the same sense that one can claim that there is no such thing as "if", "for", or "while" control structures, only branches and jumps truly exist. I think it's necessary to consider groff as a system from more perspectives from that of gropdf or any other output driver. >> I'd warn about it, at least. > > So, to be clear, if a user entered:- > > .pdfbookmark 1 "Chapter 1" Nope, because a space in this context is just a space. Remember, here we are in document metadata land, not text formatting land. In that context, it is horizontal motions that don't truly exist. The bookmark title is a sequence of characters that, in principle, could be exposed to the user in any number of ways. In a Japanese document, for example, the PDF viewer's "document info" box might not even arrange the text of the bookmark title in a horizontal orientation. Is that not conceivable? What about the document title or author's name? > You would warn about the space being a horizontal movement. What about > "Lake Attica’s Shores", would that receive two warnings? Nah, I'd warn only about the other horizontal motions I identified, except for \~, which I would probably silently translate to " ". But maybe not in the long run. If we ever get the string iterator and a string-processing support library--my notional "string.tmac"--then I'd happily make the processing of arguments to device extension commands even stricter. Metadata is not formatted text. We will only promote confusion in our users' minds if we seduce them into thinking the two equivalent. > A user may expect input destined for a bookmark to be processed > similarly to how groff outputs to a terminal, i.e. \0 is delivered as > a space. They might expect that. Other users might be surprised that they were feeding formatting-rich text into document metadata, and want to consider alternatives, especially where the widths of the spaces when formatted are distinguishable. > I am afraid I can't imagine another expectation other than \0 will > result in a space in the text, it is unlikely the expectation would be > that it is simply ignored. What should `device_request` do with `\v`? This is not a hypothetical question--Peter's using it in the document metadata of mom(7) examples right now. >> It looks to me like you've added a feature (I assume this is gropdf) >> to interpret a bespoke selection of *roff input escape sequences >> within certain device extension commands tagged "ps:exec". > > You have been aware of this for a long time, since, when we discussed > this, well before you embarked on your "death march" (as you call it), That's a reference to a fairly well-known book by Edward Yourdon, if you weren't aware. https://en.wikipedia.org/wiki/Death_march_(project_management) > and I told you that I wanted you to pass through text "as is" (if > valid), your advice was that I should not need to do that and you > planned some looping construct. Which never appeared. Right. It proved tough. It's still on my drawing board, just not for groff 1.24. We can make the feature a prerequisite for 1.24 and delay the release (and release candidate) idefinitely, or I can flesh out `device_request()` to do more validation of its argument, and/or either of us can restore string-sanitization to pdf.tmac (to keep `device_request()` from throwing warnings on input documents that one doesn't want to modify, not to obtain correct output). I'd ask that we hold off on restoring sanitization until I've altered `device_request()` and see how intolerable the warning situation is (and whether/how bookmark content is adversely affected). >> Where is this behavior documented? Should *roff users in general >> expect output drivers to implement *roff language interpreters, in >> whole or part? > > Previous pdf.tmac used .asciify so this prevented unicode bookmarks, > it is documented that these are now accepted, which entails parsing > strings passed to gropdf. Yes; all I envisioned was that it would interpret `\[uXXXX]` escape sequences. >>> Worryingly .device in HEAD manages to convert \0 to just "0" >>> which does seem wrong. >> >> Yes, that does seem odd and is worth investigating. Probably the >> dummy character should be silently discarded too. And thus likely >> some other things, like `\)`. Argh. This is the "switch out of >> copy mode back into interpretation mode, or some kind of 'mode 3'" >> death march you counseled me not to waste my time with, >> characterizing it as a self-imposed goal that would only delay the >> release. I believe the implication was that no one else would care >> about it. >> >> Well, on the bright(?) side, the release might be delayed for a >> while anyway, given that I have some documentation to write and, in >> your assessment, a low order of intellect with which to compose it. >> >> https://lists.gnu.org/archive/html/groff/2024-11/msg00131.html > > I don't think I have ever cast aspersions about your ability to > document groff, Okay, so just on my ability to write code? This despite repeated quick turnarounds on bug fixes and root-cause analysis, in one case winning the epithet "Gunga Din" from you. I grant--given Kipling's attitudes--that might not have been a compliment. ;-) > so I have no doubt you would make a great job replacing pdfmark.ms, > combining stuff from gropdf.1 and comments in pdf.tmac. In fact I know > you are so thorough it is likely you are likely to find plenty of bugs > around the seldom used edges of pdf.tmac. I would hope to turn them into automated tests, if it's tractable do so. Not only on general principle, but because you struggle to maintain calm when I inadvertently regress anything to do with gropdf. > The message you reference does not question your intellect. On a > number of occasions you have stated you found the coding style of > pdfmark.tmac/pdf.tmac opaque and difficult to navigate. I sympathise - > it is - it took me awhile to fully understand Keiths code, but I do > admire it. Any criticism in that post is not aimed at your intellect, > rather it is a disappointment you fired off about 16 commits to > pdf.tmac just after I went on sabbatical, so no discussion, to > pdf.tmac, a file which you said you had difficulty with, which > resulted in bugs being introduced. The alternative was to wait an indefinite amount of time for your return. Stuff that is not in groff's contrib/ is team-maintained.[1] We're both members of the groff development team. You have occasionally surprised me with commits to changes elsewhere in the tree, but I haven't derogated you in Savannah tickets or the mailing lists about it. I've simply fixed what I thought needed fixing. I will reiterate an offer I've made multiple times before: If you want the arrangement Peter Schaffter enjoys, where I (and others) don't touch "om.tmac" and its examples and HTML documentation without prior arrangement with him, you can have it. I'll move the "src/devices/gropdf" directory to "contrib", update our Automake files, learn you from which files (if any) you don't mind other people touching, and we'll continue from there. > The fault is not a lack of intellect, if there is one, it may be in > over confidence in changing a file which you claimed to not fully > understand. You have an inflated picture of my level of confidence in my changes. It's typically low. That's why I write automated tests. I outsource my confidence to an independently reproducible scheme of empirical tests. In my experience as a professional software engineer, a personal feeling of confidence in a code change means little and is a poor predictor of correctness or code quality. People change projects and jobs, and forget things they knew 6 weeks ago, let alone 6 months. One solution to this problem is to have designated masters/gurus who are assumed to possess a complete understanding of a system (or portion thereof), and run every applicable change request through them. This arrangement, while popular, is fragile. Such people (a) move on as described in the previous paragraph and (b) sometimes prove to have less mastery of the code than was believed by others. Another solution is to assume nothing you can't demonstrate--at which point it's no longer an assumption, but a premise upon which further reasoning about the system can be grounded. One assumption you appear to cling fast to that you might consider giving up is that I don't give a damn if I break gropdf or its output. >>> There have always been differences between \X and .device but after >>> Branden's extensive changes in this area I'm not sure our >>> documentation has caught up, since it appears to now say there is >>> no difference, >> >> Where does it say that? > > I read the entry in groff.pdf too fast, missed the fact that later > paragraphs only applied to \X. `\X` and `device` differ in their argument handling much more than I'd like. (The reason is fundamental to *roff input processing: `device` reads its argument in copy mode and `\X` does not.) Much of my effort this year has gone into concealing those differences from the user, who (for example) just wants to put their dang bookmarks in the output (or have them autogenerated by a macro package) and will be baffled if they have to process or encode those bookmarks differently depending on which formatter feature they use. One doesn't spell a font name differently depending on whether one uses `\f` or `.ft`, for instance. Long-term, I hope to make `\X` and `device` much more similar under the hood. If I ever achieve that, I'll be sure to announce the fact. >>> commit 2548c4659c appears to be the culprit, also affects -T html. >> >> How does grohtml interpret the following? > > grohtml uses use_charnames_in_special so I was trying to give more > information to pin down the particular code path causing the issue. I'm already working under the assumption that the problem is in `device_request()`, which does not operate differently depending on the output device. I have in mind line 6025,[2] where the not-quite-a- finite-state-machine[3] knows that it has seen a backslash. The problem, or part of it, is that I don't do something sensible (like throw a warning) with a character immediately after the backslash if it isn't `[`, because I suffered an attack of tunnel vision while trying to get the desired use case to work. I may be fanatical about input validation, but I get distracted from that preoccupation sometimes. Regards, Branden [1] also stuff that _is_ in contrib/ when its author/maintainer has disappeared [2] https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?id=f107252c3c0d3b70236e0db5a56238a406231bf5#n6025 [2] Formally, it indeed is one, but some people call an FSM an FSM only if it's implemented using `switch/case`. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?64484> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature