[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

G. Branden Robinson Thu, 19 Mar 2026 11:15:00 -0700

Follow-up Comment #14, bug #67735 (group groff):

At 2026-03-19T13:50:36-0400, Dave wrote:
> Follow-up Comment #13, bug #67735 (group groff):
>
> [comment #0 original submission:]
>> It's necessary to nail this down to migrate the underlying
>> representation type to something wide enough to hold Unicode
>> code points.
>
> Said migration is now bug #68129, which per the above I've made
> dependent on this ticket.


I guess I could illuminate my plans here.

Step 1: Migrate handling of input characters to `unsigned char`.

Step 2: Create new structure type `grochar`, consisting solely of an
        `unsigned char` initially.  This is to make the type opaque to
        C++'s C-based legacy type system, which aggressively
        interconverts integral types.  `typedef` is one of the most
        misleading programming language keywords ever devised.

Step 3: Convert `grochar` into a more elaborate `class` or `struct` that
        uses a wider type (likely `char32_t`) internally and includes a
        constructor handling _signed_ characters read from input.
        Because UTF-8 is a variable-length encoding, I don't see how we
        can handle it without tightly coupling this type with GNU
        troff's input stream reader code.  Either this class or string
        and file stream readers need to be prepared to "pump" the input
        stream to collect enough bytes to decide the validity of a
        (variable-length) input character.

Step 4: ???



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?67735>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

signature.asc
Description: PGP signature

[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

Reply via email to