Hi onf, At 2024-11-18T21:43:24+0100, onf wrote: > > The alternative diff is simple: > > [...] > > + warning (WARN_SYNTAX, "not interpreting escaped '%1' in device" > > + " extension command request argument", char(c1)); > > + else > > + warning(WARN_SYNTAX, "not interpreting escaped control character" > > + " (code %1) in device extension command request" > > + " argument", c1); > > Just a small nitpick, but you should probably do s/(warning) /\1/.
Yes, that was a code style goof. Thanks! Fixed. > I would prefer the former IF it removed their arguments. Since it > doesn't and the removal is only partial, I prefer the latter. Me too. I don't want to reimplement delimited escape sequence parsing here, and the only other place it's done is inside the _gigantic_ `token::next()` function which has a MAERSK-load of state in it, so it's really hard to factor out that logic. Long-term, maybe the right choice for `\X` and `device` is to have them both read their arguments in interpretation mode (not copy mode) and warn on any node type they encounter in the argument, except for a special character. But I seem to recall having tried that path before and discovered other frustrations. It's like an integer linear programming problem. Any one constraint I satisfy violates two others. Here's the current shape of the #64484 "fix" (I don't doubt that part of that ticket will live on beyond the current crisis) in my working tree: diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp index 8fafa8a0a..5d9dcfb81 100644 --- a/src/roff/troff/input.cpp +++ b/src/roff/troff/input.cpp @@ -6016,14 +6016,31 @@ static void device_request() // Null characters can correspond to node types like vmotion_node that // are unrepresentable in a device extension command, and got scrubbed // by `asciify`. - for (int prevc = c; (c != '\0') && (c != '\n') && (c != EOF); + for (; + (c != '\0') && (c != '\n') && (c != EOF); c = get_copy(0 /* nullptr */)) { - if (!('\\' == c) && (prevc != '\\')) + if (c != '\\') mac.append(c); else { int c1 = get_copy(0 /* nullptr */); - if (c1 != '[') + if (c1 != '[') { + mac.append(c); mac.append(c1); + string chardesc = ""; + if (csprint(c1)) { + chardesc += "'"; + chardesc += char(c1); + chardesc += "'"; + } + else { + chardesc += "character code "; + chardesc += i_to_a(c1); + } + chardesc += '\0'; // make it safe for .contents() + warning (WARN_SYNTAX, "not interpreting escaped %1 in device" + " extension command request argument", + chardesc.contents()); + } else { // Does the input resemble a valid (bracket-form) special // character escape sequence? I'll this opportunity to share a lesson regarding groff's internal classes that is, as far as I can tell, documented nowhere and that I learned the hard way just this year. Ya see this line? + chardesc += '\0'; // make it safe for .contents() GNU troff/libgroff ubiquitously uses two classes that both advertise a `contents()` member function. The `symbol` class implements it. So does the `string` class. Roughly speaking, the `symbol` class gives you a dictionary keyed on `const char *` with values that are also `const char *`. The `string` class is what it says, and does something Stroustrup encouraged everyone to do in the pre-ISO days of C++ when groff was first written. "Roll your own!" he enthused. "C++ makes memory management and automatically growing vectors easy! Just use the free store!" (He never calls dynamic storage a "heap", to this day.) "You can tune your custom string class to do only what you need! C++ imposes no overhead on you! Zero-cost abstractions! You give up no speed! It's as fast as C!" Sales pitch, sales pitch, sales pitch. James Clark listened and followed recommended best practice. So anyway, both classes have `contents()` member functions. And why not? Isomorphism for the win, right? Operations that do the same thing should have the same names, right? '<<' means "left shift", right? Also "write stuff to a stream" or "generic output operator"... What? Also look at Java and its debacle with a multiplicity of different ways for getting the lengths of things.[1] But I digress. Yeah, no, in groff `contents()` is a land mine. Whereas the `symbol` class's `contents()` member function will return you a `const char *` that is null-terminated... ...the `string` class's `contents()` member function will return you a `const char *` that ISN'T NECESSARILY null-terminated. (Why would it need to be null-terminated? It's a "smart" string data type with just three private member variables--`ptr`, `len`, and `sz`--that mean exactly what you'd expect!) If I were the better programmer that I think Deri expects me to be, I reckon I'd have seen that hazard from miles away. Well, I know now. My arms and legs are stitched back on and I hobble shakily forward through the groff minefield... (There might not be evidence of me getting blown up [this time] in a Savannah ticket. On that occasion it may have taken place only in my working copy and no trace of my bloody giblets made it into groff's Git repository. I let this anecdote inform the reader's conclusions regarding the level of care I take with changes to the groff codebase.) Regards, Branden [1] It's me talking so, yes, I'm going to tell you how Ada did this right. In Ada there is a general property mechanism called "attributes". Essentially this is a language-level curated lexicon. So, for any data type to which the concept of "length" might apply, you can get at it with the "tick" symbol ' suffixed to an object of that type (and sometimes just the name of the type itself). Here it is in the curses version of "Hello, world!" in Ada. with Terminal_Interface.Curses; use Terminal_Interface.Curses; procedure Hello is Visibility : Cursor_Visibility := Invisible; Message : constant String := "Hello, World!"; done : Boolean := False; c : Key_Code; begin Init_Screen; Set_Echo_Mode(False); Set_Cursor_Visibility(Visibility); Set_Timeout_Mode(Standard_Window, Non_Blocking, 0); Move_Cursor(Line => Lines / 2, Column => (Columns - Message'Length) / 2); Add(Str => Message); while not done loop c := Get_Keystroke(Standard_Window); case c is when Character'Pos('q') => done := True; when others => null; end case; Nap_Milli_Seconds(50); end loop; End_Windows; end Hello; And in case you were wondering, yes, since the length of the constant String `Message` is known at compile time, it's computed then too. Over a decade before C++ (and C) weenies invented `constexpr`, Ada distinguished statically computed expressions from dynamically computed ones, and also formalized a third period at which computation can occur, that being the fixation of expressions or properties of language objects at the time an object file is loaded into memory and does not need to be recomputed at any time during execution. This stage, Ada calls "elaboration" and, beyond construction of objects visible in the global scope in C-like OOP languages (and which can do arbitrary things), I'm not aware of a conceptual counterpart in more recent languages. I'd be happy to be corrected.
signature.asc
Description: PGP signature