At 2023-04-29T22:33:52-0500, Dave Kemper wrote: > On 4/26/23, G. Branden Robinson <g.branden.robin...@gmail.com> wrote: > > It would probably be a good idea to represent Unicode strings > > internally using char32_t as a base type anyway, but groff's design > > under the Unix filter model described above makes the choice less > > dramatic in terms of increased space consumption than it would > > otherwise be. > > But to keep scalability in mind, this design shouldn't be assumed to > be immutable. Implementing the Knuth-Plass (or some other) > paragraph-at-once algorithm would greatly expand the amount of input > groff has to remember at once,
Only by about an order of magnitude. Which sounds like a lot until we consider how many of those we've gained in memory and persistent storage bandwidth and CPU instruction retirement rate since the PDP-11. > and a theoretical future chapter-at-once algorithm (to, for example, > optimize page layouts to eliminate widows) vastly expands it beyond > that. Well, if you format each paragraph in a diversion, you don't need to expand the formatter's view of the present as much. > It's possible memory is too cheap to worry about even the worst case, > where groff 4.38 has to hold an entire document in memory (maybe to > finally allow it to put the table of contents up front without page > reordering), Just in case people fear you're _not_ being facetious, there are better solutions for this. One already exists in PDF, and I have proposed a general solution for all documents. https://savannah.gnu.org/bugs/?61836 > but it's a question worth considering before making changes to groff's > fundamental data type. I disagree with this too. Part of the value of encapsulation of the fundamental character type inside a formatter-specific type is that we can change our minds _again_ if circumstances warrant. Regards, Branden
signature.asc
Description: PGP signature