> Ah, thank you. So you are mapping the Russian alphabet to internal > characters corresponding to KOI-8-R and then using a hyphenation > pattern in the same encoding.
Yes. > This way, not only UTF-8 input may be fed to groff, but also KOI-8-R > -- just omit preconv processing (the -K or -k option to groff)! Yes. > Here's my understanding of what happens: > > ---------------------------------------------------- > 8-bit input UTF-8 input > ---------------------------------------------------- > The input file is read The input file is first > in, and the input map- processed by preconv, > ping, in your case speci- that converts input char- > fied in koi8-r.tmac, is acters into AGL-compati- > applied. As the result, ble inner entities. Then, > the text is converted to the mapping file is > AGL-compatible sequence applied, but it has no > of entities. effect, because the input > stream now consists of > directly specifed AGL- > compatible entities in > the form \[uXXXX]. Correct. However, please avoid the term `AGL compatible'. We are not talking about glyphs but about characters! Contrary to TeX, groff handles hyphenation before the conversion from characters to glyphs has happened (more or less). > Hyphenation patterns are read and converted into hyphenation codes. > > Since hyphenation patterns must be matched against the text in terms > of hyphenation codes, groff needs somehow to derive the hyphenation > code for each of the internal entities constituting its input. It > cannot be done directly with the input characters because they can > have been translated by preconv. This is not optimal, of course, since it limits hyphenation to handle at most 256 characters. In other words, languages like Ethiopian which uses an alphabet consisting of more than 256 characters can't be hyphenated with the current implementation of groff. However, I won't change that... > Therefore, I suppose that groff applies the existing character > translations inversely to get back to some simple characters. Then, > hyphenation codes can be computed and compared against the > hyphenation patterns. Yes. > So, only the input stream gets processed by preconv, while the > hyphenation codes and patterns are read in directly. To make > pattern matching possible, a set of .trin commands is used to define > a mapping from internal entities to simple input characters and, via > .hcode requests, to hyphenation codes. Correct. > But generally, this map cannot be inversely applied becuase several > input characters may be mapped into one internal entity. What does > groff do in this case? Please give me an example where this is relevant to hyphenation. Werner