At 2025-03-10T14:57:24-0500, Dave Kemper wrote: > On Sun, Mar 9, 2025 at 11:03 AM G. Branden Robinson > <g.branden.robin...@gmail.com> wrote: > > it appears that "removing" characters may have never really worked, > > because the act of looking one up created it. > > Do you have sample input that demonstrates this?
I do not, and I now believe I was incorrect. > Two simple tests--using the "c" conditional, and attempting to use the > character--seem to show .rchar working fine. > > $ cat character_removal > .nf > .char \[unhappy] :-( > .if c \[unhappy] .tm Unhappiness defined. > I feel \[unhappy] > .rchar \[unhappy] > .if c \[unhappy] .tm Unhappiness still defined. > I still feel \[unhappy] > $ groff -Tascii character_removal | cat -s > Unhappiness defined. > troff:character_removal:7: warning: special character 'unhappy' not defined > I feel :-( > I still feel > > This output is the same (other than changes to the wording of the > warning) from groff 1.19.2 to 1.23. (It fails in git HEAD because of > #66675, but changing the name of the character to not start with > lowercase "u" makes it produce the same results as well.) I made my claim based solely upon looking at the code. (tl;dr: But I didn't look closely enough.) Examining groff 1.23.0, before I mucked with this stuff: https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?h=1.23.0#n4321 `remove_character()`, the `rchar` request handler, calls `tok.get_char(true)`, where the `true` argument means that a character is expected on the input stream and should be warned about if missing. `tok` is an object of the `token` class. Its `get_char()` member function is defined here. https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?h=1.23.0#n7248 You've defined a special character (`\[unhappy]`), so the branch at like 7252 will be taken. That calls the `get_charinfo()` function. https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?h=1.23.0#n8961 This function performs a `lookup()` of its argument (a `symbol` `nm`) in the `dictionary` `charinfo_dictionary`. groff long predates the C++ STL, so except for a handful of later developments by Werner and me, all of its container classes are bespoke. That `dictionary` is declared here... https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?h=1.23.0#n8959 ...and the class is implemented in its own file. https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/dictionary.cpp?h=1.23.0 `lookup()` is defined on line 44, after a citation to Knuth's TAOCP for the approach. https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/dictionary.cpp?h=1.23.0#n44 Actually, because all we passed in was a `symbol` `nm`, and this form of the (overloaded) `lookup` function takes a second argument `v`, we should confirm that the second argument is given a default value in a header file... https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/dictionary.cpp?h=1.23.0#n44 ...and sure enough it does--`0`, the pre-C++11 way to spell a null pointer (and the traditional C and C++ way to spell any scalar type represented with a bit pattern of all zeroes in the machine[1]). So, returning to dictionary.cpp... ...we have some mathy stuff that looks plausibly like a lookup of the symbol `nm` by its hash value and...ah, you caught me in an error. Lines 53--63 are the key. I'll annotate. (As openhub.net says: "GNU troff... ...is mostly written in C++ with a low number of source code comments.") src/roff/troff/dictionary.cpp: if (s == table[i].s) { // This means we got a match on our `nm`. if (v != 0) { // If `v` is not a null pointer... void *temp = table[i].v; // ...make it the new value of `nm`... table[i].v = v; return temp; // ...and return the old one. } else return table[i].v; // Otherwise, return existing value of `nm`. } if (v == 0) // We fall through to here if `nm` has no match. return 0; So, you're right! And now I understand groff's dictionaries a little better. I don't regret implementing a new `suppress_creation` flag for character lookup, though... commit 9a9d7b55b6cc565ed2c065b8e993bdac9fb576d1 Author: G. Branden Robinson <g.branden.robin...@gmail.com> Date: Fri Feb 21 04:05:40 2025 -0600 [troff]: Create characters less aggressively. Make it possible to retrieve information about a *roff character without creating it as a side effect. * src/roff/troff/charinfo.h (get_charinfo, get_charinfo_by_index): * src/roff/troff/token.h (get_char): * src/roff/troff/input.cpp (get_charinfo_by_index): Add new `bool` argument `lookup_only`, defaulting `false`. * src/roff/troff/input.cpp: Use these new facilities. (report_character_request): Prevent creation of each character we look up. Improve diagnostics. Report information about indexed characters (`\N'123'`) more intelligibly. (remove_character): Prevent creation of each character we remove. Intelligibly report nonexistence of characters for which removal is attempted. Throw error when attempt is made to remove a non-character, like `\~`. commit 889fb755b53e4e41c09a68163e4b2b4f1bac0d4b Author: G. Branden Robinson <g.branden.robin...@gmail.com> Date: Thu Feb 27 02:43:14 2025 -0600 [troff]: Trivially refactor. * src/roff/troff/input.cpp (report_character_request, remove_character) (get_char, get_charinfo, get_charinfo_by_index): Rename `lookup_only` function arguments to `suppress_creation`, a name already used by register classes for the same purpose. ...because long-term, I want us to be able to abandon our bespoke dictionary implementation in favor of an STL container. See, for example: https://en.cppreference.com/w/cpp/container/unordered_map ...for a C++ container close to what groff implements already, and https://en.cppreference.com/w/cpp/container/map ...for an ordered one, which we might want just because it could make handling dictionary dumpers like `pnr`, `pm`, `pcomposite`, and `phw` more pleasant to deal with. (The items won't come out in an essentially random order.) ...if it doesn't hurt performance too badly, or require bumping to too new a version of C++. I've started to get a notion of just how kludgey some aspects of C++98 are,[2] so some things might not be worth doing until we feel we can stake groff's future on C++11. But I do need to modify that ChangeLog entry to tell the truth, so I'll do that. Thanks for prompting me to dig into this. Regards, Branden [1] ...be it an integer, a floating-point value, a Boolean, a data pointer, a function pointer, or any of an unbounded set of sum types (think `enum`s) because Real Programmers don't use type systems, grumble grumble grumble [2] When I started to dig more deeply into C++ a few years ago so that I could better contribute to groff, I read Lippman's _Essential C++_ to spin myself up on C++98-era features and practices. (I also got Stroustrup's Special Edition, but, like me in email, he's a chatty writer who feels compelled to warn the youth of all the dire mistakes one can make.[3]) When I ran into std::bind1st and std::bind2nd (the language's way of implementing currying, as I understand it), my face twisted into a rictus of torment. https://en.cppreference.com/w/cpp/utility/functional/bind12 Fortunately, C++ luminaries acknowledged just how naff those ideas were and replaced them, deprecating and ultimately withdrawing them. What OO and functional programming skills I have, I largely acquired via Python 2. C++98 feels like a step down from that; C++11 might not be. I'll decide when I know more. [3] K&R, by contrast, tersely spec out the chainsaw feature of C and leave the reader to their own devices. https://darwinawards.com/darwin/darwin1996-07.html
signature.asc
Description: PGP signature