Hi Werner (and all) Thanks for this clarifying explanation. I have a couple of comments, one explanatory, the other which, I think, may point to the core of the question.
On 21-Oct-00 Werner LEMBERG wrote: >> Troff's multi-character naming convention means that anything you >> could possibly need can be defined, and given a name in the troff >> input "character set" whenever you really need it, so long as you >> have the device resources to render the appropriate glyph. > > There are only 256 `multi-characters' named `charXXX'. Everything > else are glyph entities (even if they behave like a character in most > cases). The reality is that groff doesn't really make a difference > between a character and a glyph, and it has high priority to me to > implement this distinction. I'll probably start with renaming a lot > of troff internals. 1. Perhaps I should clarify: by "multi-character naming convention" I mean the fact that you can decide to use the sequence of ASCII characters, for instance, "\[O-ogonek]" as the name of a "character". In passing: I see no _logical_ distinction between using a string of ASCII characters to name a "character", and using a string of bytes which implements a UTF-8 encoding. 2. Perhaps it is a good point of view to see troff (gtroff) as an engine which handles _glyphs_, not characters, in a given context of typographic style and layout. The current glyph is defined by the current point size, the current font, and the name of the "character" which is to be rendered, and troff necessarily takes account of the metric information associated with this glyph. The fact that ASCII characters and the iso-latin-1 characters corresponding to byte-values > 128 are (by default) the troff names of "characters" in a group of European languages -- together with certain other marks and symbols -- is logically (in my view) an irrelevant coincidence which happens to be very convenient for people using these languages; but it is not at all necessary. Nothing at all stops you from defining .char a \*a as the name of Greek "alpha", and so on, if you want to simply the typing of input in a passage of Greek using an ASCII interface. Logically, therefore, troff could be "neutral" about what the byte "a" stands for. From that point of view, a troff which makes no assumptions of this kind, amd which consults external tables about the meaning of its input and about the characteristics of what output that input implies, purely for the purpose of correct formatting, is perhaps the pure ideal. And from that point of view, therefore, unifying the input conventions on the basis of a comprehensive encoding (such as UTF-8 or Unicode is intended to become) would be a great step towards attaining this neutrality. However, I wish to think more about this issue. Meanwhile, interested parties who have not yet studied it may find the "UTF-8 and Unicode FAQ for Unix/Linux" by Markus Kuhn well worth reading: http://www.cl.cam.ac.uk/~mgk25/unicode.html By the way, your comment that hyphenation, for instance, is not a "glyph question" is, I think, not wholly correct. Certainly, hyphenation _rules_ are not a glyph question: as well as being language-dependent, there may also be "house rules" about it; these come under "typographic style" as above. But the size of a hyphen and associated spacing are glyph issues, and these may interact with where a hyphenation occurs or whether it occurs at all, according to the rules. An interesting debate! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 284 7749 Date: 21-Oct-00 Time: 23:47:03 ------------------------------ XFMail ------------------------------