> 2. Perhaps it is a good point of view to see troff (gtroff) as an > engine which handles _glyphs_, not characters, in a given context of > typographic style and layout. The current glyph is defined by the > current point size, the current font, and the name of the > "character" which is to be rendered, and troff necessarily takes > account of the metric information associated with this glyph.
Exactly. But the current terminology in gtroff is more than ambiguous, and I believe that we need a clear separation between characters and glyphs. > Logically, therefore, troff could be "neutral" about what the byte > "a" stands for. From that point of view, a troff which makes no > assumptions of this kind, amd which consults external tables about > the meaning of its input and about the characteristics of what > output that input implies, purely for the purpose of correct > formatting, is perhaps the pure ideal. And from that point of view, > therefore, unifying the input conventions on the basis of a > comprehensive encoding (such as UTF-8 or Unicode is intended to > become) would be a great step towards attaining this neutrality. I fully agree. A single input character set (as universal as possible) is the right thing, and everything else shall be managed by preprocessors (and a postprocessor for tty). > Meanwhile, interested parties who have not yet studied it may find > the "UTF-8 and Unicode FAQ for Unix/Linux" by Markus Kuhn well worth > reading: > > http://www.cl.cam.ac.uk/~mgk25/unicode.html Yes, Markus is doing an excellent job. > By the way, your comment that hyphenation, for instance, is not a > "glyph question" is, I think, not wholly correct. Certainly, > hyphenation _rules_ are not a glyph question: as well as being > language-dependent, there may also be "house rules" about it; these > come under "typographic style" as above. But the size of a hyphen > and associated spacing are glyph issues, and these may interact with > where a hyphenation occurs or whether it occurs at all, according to > the rules. I mean the algorithm of finding possible breakpoints which must be based on input characters. The final decision where a word will be broken is of course a glyph issue. Werner

