URL: <https://savannah.gnu.org/bugs/?66671>
Summary: explore removal of "cset.{h,cpp}" libgroff facility Group: GNU roff Submitter: gbranden Submitted: Thu 16 Jan 2025 12:58:57 AM UTC Category: Core Severity: 1 - Wish Item Group: Lint Status: Postponed Privacy: Public Assigned to: None Open/Closed: Open Discussion Lock: Any Planned Release: None _______________________________________________________ Follow-up Comments: ------------------------------------------------------- Date: Thu 16 Jan 2025 12:58:57 AM UTC By: G. Branden Robinson <gbranden> These class-based wrappers around the C standard library's `isalpha()`, `isspace()` and friends make _groff_'s code a little less accessible to experienced C/C++ programmers and don't **appear** to be delivering any benefit. I have a **guess** for why they're here. They date back to the dawn of the repository (1991), and I'll bet they go all the way back to 1989. This is before the standard C library grew i18n/l10n support. That came with ISO C95, a mostly overlooked revision of the language. Before that time, you could only count on the standard C "ctype.h" functions to tell you what was true of ASCII. Possibly, Clark needed this for ISO 8859-1 support; GNU _troff_ assumed an 8-bit input coding coming out of the gate and ISO Latin-1 was known to be only one of several possibilities.[1] However, as far as I know, a need for the locale-specific ctype functions/classes never eventuated. Alternative input character encodings were handled by issuing `trin` requests to the formatter at startup (through macro files loaded via _troffrc_). Nowadays, C requires that if you've called `setlocale()`, those functions will tell you the truth applicable to your character encoding (and/or language). Further, it's my intention to rip support for ISO 8859-1 per se _out_ of the formatter and then, after a deprecation cycle, make it interpret UTF-8 instead. (A dead period of ASCII-only support is, I suspect, advisable to reduce the amount of mojibake produced by users' old Latin-X documents.) And another thing! This sort of work, even if it still needs to be done, is not a concern specific to _groff_. It should be offloaded to _gnulib_ or similar. Postponing to the 1.25 release cycle. I am **not** certain of my historical surmises above. They should be better substantiated before this proceeds. [1] And it needed to know which code points in the upper half of ISO 8859-1 were letters (_isalpha_()) because those could be candidate hyphenation points whereas their complement would not be. The direction of GNU _troff_'s development of hyphenation is to base hyphenation codes on _language_, not _character encoding_. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?66671> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature