At 2025-03-10T14:57:24-0500, Dave Kemper wrote:
> On Sun, Mar 9, 2025 at 11:03 AM G. Branden Robinson
> <g.branden.robin...@gmail.com> wrote:
> > it appears that "removing" characters may have never really worked,
> > because the act of looking one up created it.
> 
> Do you have sample input that demonstrates this?

I do not, and I now believe I was incorrect.

> Two simple tests--using the "c" conditional, and attempting to use the
> character--seem to show .rchar working fine.
> 
> $ cat character_removal
> .nf
> .char \[unhappy] :-(
> .if c \[unhappy] .tm Unhappiness defined.
> I feel \[unhappy]
> .rchar \[unhappy]
> .if c \[unhappy] .tm Unhappiness still defined.
> I still feel \[unhappy]
> $ groff -Tascii character_removal | cat -s
> Unhappiness defined.
> troff:character_removal:7: warning: special character 'unhappy' not defined
> I feel :-(
> I still feel
> 
> This output is the same (other than changes to the wording of the
> warning) from groff 1.19.2 to 1.23.  (It fails in git HEAD because of
> #66675, but changing the name of the character to not start with
> lowercase "u" makes it produce the same results as well.)

I made my claim based solely upon looking at the code.

(tl;dr: But I didn't look closely enough.)

Examining groff 1.23.0, before I mucked with this stuff:

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?h=1.23.0#n4321

`remove_character()`, the `rchar` request handler, calls
`tok.get_char(true)`, where the `true` argument means that a character
is expected on the input stream and should be warned about if missing.

`tok` is an object of the `token` class.  Its `get_char()` member
function is defined here.

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?h=1.23.0#n7248

You've defined a special character (`\[unhappy]`), so the branch at like
7252 will be taken.  That calls the `get_charinfo()` function.

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?h=1.23.0#n8961

This function performs a `lookup()` of its argument (a `symbol` `nm`) in
the `dictionary` `charinfo_dictionary`.  groff long predates the C++
STL, so except for a handful of later developments by Werner and me, all
of its container classes are bespoke.

That `dictionary` is declared here...

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?h=1.23.0#n8959

...and the class is implemented in its own file.

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/dictionary.cpp?h=1.23.0

`lookup()` is defined on line 44, after a citation to Knuth's TAOCP for
the approach.

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/dictionary.cpp?h=1.23.0#n44

Actually, because all we passed in was a `symbol` `nm`, and this form of
the (overloaded) `lookup` function takes a second argument `v`, we
should confirm that the second argument is given a default value in a
header file...

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/dictionary.cpp?h=1.23.0#n44

...and sure enough it does--`0`, the pre-C++11 way to spell a null
pointer (and the traditional C and C++ way to spell any scalar type
represented with a bit pattern of all zeroes in the machine[1]).

So, returning to dictionary.cpp...

...we have some mathy stuff that looks plausibly like a lookup of the
symbol `nm` by its hash value and...ah, you caught me in an error.

Lines 53--63 are the key.  I'll annotate.  (As openhub.net says:

"GNU troff...

...is mostly written in C++
   with a low number of source code comments.")

src/roff/troff/dictionary.cpp:

    if (s == table[i].s) { // This means we got a match on our `nm`.
      if (v != 0) { // If `v` is not a null pointer...
        void *temp = table[i].v; // ...make it the new value of `nm`...
        table[i].v = v;
        return temp; // ...and return the old one.
      }
      else
        return table[i].v; // Otherwise, return existing value of `nm`.
    }
    if (v == 0) // We fall through to here if `nm` has no match.
      return 0;

So, you're right!  And now I understand groff's dictionaries a little
better.  I don't regret implementing a new `suppress_creation` flag for
character lookup, though...

commit 9a9d7b55b6cc565ed2c065b8e993bdac9fb576d1
Author: G. Branden Robinson <g.branden.robin...@gmail.com>
Date:   Fri Feb 21 04:05:40 2025 -0600

    [troff]: Create characters less aggressively.
    
    Make it possible to retrieve information about a *roff character without
    creating it as a side effect.
    
    * src/roff/troff/charinfo.h (get_charinfo, get_charinfo_by_index):
    * src/roff/troff/token.h (get_char):
    * src/roff/troff/input.cpp (get_charinfo_by_index): Add new `bool`
      argument `lookup_only`, defaulting `false`.
    
    * src/roff/troff/input.cpp: Use these new facilities.
    
      (report_character_request): Prevent creation of each character we look
      up.  Improve diagnostics.  Report information about indexed characters
      (`\N'123'`) more intelligibly.
    
      (remove_character): Prevent creation of each character we remove.
      Intelligibly report nonexistence of characters for which removal is
      attempted.  Throw error when attempt is made to remove a
      non-character, like `\~`.

commit 889fb755b53e4e41c09a68163e4b2b4f1bac0d4b
Author: G. Branden Robinson <g.branden.robin...@gmail.com>
Date:   Thu Feb 27 02:43:14 2025 -0600

    [troff]: Trivially refactor.

    * src/roff/troff/input.cpp (report_character_request, remove_character)
      (get_char, get_charinfo, get_charinfo_by_index): Rename `lookup_only`
      function arguments to `suppress_creation`, a name already used by
      register classes for the same purpose.

...because long-term, I want us to be able to abandon our bespoke
dictionary implementation in favor of an STL container.

See, for example:

https://en.cppreference.com/w/cpp/container/unordered_map

...for a C++ container close to what groff implements already, and

https://en.cppreference.com/w/cpp/container/map

...for an ordered one, which we might want just because it could make
handling dictionary dumpers like `pnr`, `pm`, `pcomposite`, and `phw`
more pleasant to deal with.  (The items won't come out in an essentially
random order.)  ...if it doesn't hurt performance too badly, or require
bumping to too new a version of C++.  I've started to get a notion of
just how kludgey some aspects of C++98 are,[2] so some things might not
be worth doing until we feel we can stake groff's future on C++11.

But I do need to modify that ChangeLog entry to tell the truth, so I'll
do that.

Thanks for prompting me to dig into this.

Regards,
Branden

[1] ...be it an integer, a floating-point value, a Boolean, a data
    pointer, a function pointer, or any of an unbounded set of sum types
    (think `enum`s) because Real Programmers don't use type systems,
    grumble grumble grumble

[2] When I started to dig more deeply into C++ a few years ago so that I
    could better contribute to groff, I read Lippman's _Essential C++_
    to spin myself up on C++98-era features and practices.  (I also got
    Stroustrup's Special Edition, but, like me in email, he's a chatty
    writer who feels compelled to warn the youth of all the dire
    mistakes one can make.[3])  When I ran into std::bind1st and
    std::bind2nd (the language's way of implementing currying, as I
    understand it), my face twisted into a rictus of torment.

    https://en.cppreference.com/w/cpp/utility/functional/bind12

    Fortunately, C++ luminaries acknowledged just how naff those ideas
    were and replaced them, deprecating and ultimately withdrawing them.

    What OO and functional programming skills I have, I largely acquired
    via Python 2.  C++98 feels like a step down from that; C++11 might
    not be.  I'll decide when I know more.

[3] K&R, by contrast, tersely spec out the chainsaw feature of C and
    leave the reader to their own devices.

    https://darwinawards.com/darwin/darwin1996-07.html

Attachment: signature.asc
Description: PGP signature

Reply via email to