On 4/26/23, G. Branden Robinson <g.branden.robin...@gmail.com> wrote: > It would probably be a good idea to represent Unicode strings internally > using char32_t as a base type anyway, but groff's design under the Unix > filter model described above makes the choice less dramatic in terms of > increased space consumption than it would otherwise be.
But to keep scalability in mind, this design shouldn't be assumed to be immutable. Implementing the Knuth-Plass (or some other) paragraph-at-once algorithm would greatly expand the amount of input groff has to remember at once, and a theoretical future chapter-at-once algorithm (to, for example, optimize page layouts to eliminate widows) vastly expands it beyond that. It's possible memory is too cheap to worry about even the worst case, where groff 4.38 has to hold an entire document in memory (maybe to finally allow it to put the table of contents up front without page reordering), but it's a question worth considering before making changes to groff's fundamental data type.