GNU troff's fundamental character type (was: neatroff for Russian)

G. Branden Robinson Sat, 06 May 2023 20:06:49 -0700

At 2023-04-29T22:33:52-0500, Dave Kemper wrote:
> On 4/26/23, G. Branden Robinson <g.branden.robin...@gmail.com> wrote:
> > It would probably be a good idea to represent Unicode strings
> > internally using char32_t as a base type anyway, but groff's design
> > under the Unix filter model described above makes the choice less
> > dramatic in terms of increased space consumption than it would
> > otherwise be.
> 
> But to keep scalability in mind, this design shouldn't be assumed to
> be immutable.  Implementing the Knuth-Plass (or some other)
> paragraph-at-once algorithm would greatly expand the amount of input
> groff has to remember at once,


Only by about an order of magnitude.  Which sounds like a lot until we
consider how many of those we've gained in memory and persistent storage
bandwidth and CPU instruction retirement rate since the PDP-11.

> and a theoretical future chapter-at-once algorithm (to, for example,
> optimize page layouts to eliminate widows) vastly expands it beyond
> that.

Well, if you format each paragraph in a diversion, you don't need to
expand the formatter's view of the present as much.

> It's possible memory is too cheap to worry about even the worst case,
> where groff 4.38 has to hold an entire document in memory (maybe to
> finally allow it to put the table of contents up front without page
> reordering),

Just in case people fear you're _not_ being facetious, there are better
solutions for this.  One already exists in PDF, and I have proposed a
general solution for all documents.

https://savannah.gnu.org/bugs/?61836

> but it's a question worth considering before making changes to groff's
> fundamental data type.

I disagree with this too.  Part of the value of encapsulation of the
fundamental character type inside a formatter-specific type is that we
can change our minds _again_ if circumstances warrant.

Regards,
Branden

signature.asc
Description: PGP signature

GNU troff's fundamental character type (was: neatroff for Russian)

Reply via email to