Re: [Groff] Installing Russian Type-1 Fonts

Werner LEMBERG Fri, 19 Aug 2011 21:21:49 -0700

> Ah, thank you.  So you are mapping the Russian alphabet to internal
> characters corresponding to KOI-8-R and then using a hyphenation
> pattern in the same encoding.


Yes.

> This way, not only UTF-8 input may be fed to groff, but also KOI-8-R
> -- just omit preconv processing (the -K or -k option to groff)!

Yes.

> Here's my understanding of what happens:
> 
> ----------------------------------------------------
>        8-bit input                UTF-8 input
> ----------------------------------------------------
> The  input  file  is read  The input file  is  first
> in, and  the  input  map-  processed   by   preconv,
> ping, in your case speci-  that converts input char-
> fied in  koi8-r.tmac,  is  acters  into AGL-compati-
> applied.  As  the result,  ble inner entities. Then,
> the text is converted  to  the   mapping   file   is
> AGL-compatible   sequence  applied, but  it  has  no
> of entities.               effect, because the input
>                            stream  now  consists  of
>                            directly   specifed  AGL-
>                            compatible  entities   in
>                            the form \[uXXXX].

Correct.  However, please avoid the term `AGL compatible'.  We are not
talking about glyphs but about characters!  Contrary to TeX, groff
handles hyphenation before the conversion from characters to glyphs
has happened (more or less).

> Hyphenation patterns are read and converted into hyphenation codes.
> 
> Since hyphenation patterns must be matched against the text in terms
> of hyphenation codes, groff needs somehow to derive the hyphenation
> code for each of the internal entities constituting its input.  It
> cannot be done directly with the input characters because they can
> have been translated by preconv.

This is not optimal, of course, since it limits hyphenation to handle
at most 256 characters.  In other words, languages like Ethiopian
which uses an alphabet consisting of more than 256 characters can't be
hyphenated with the current implementation of groff.  However, I won't
change that...

> Therefore, I suppose that groff applies the existing character
> translations inversely to get back to some simple characters.  Then,
> hyphenation codes can be computed and compared against the
> hyphenation patterns.

Yes.

> So, only the input stream gets processed by preconv, while the
> hyphenation codes and patterns are read in directly.  To make
> pattern matching possible, a set of .trin commands is used to define
> a mapping from internal entities to simple input characters and, via
> .hcode requests, to hyphenation codes.

Correct.

> But generally, this map cannot be inversely applied becuase several
> input characters may be mapped into one internal entity.  What does
> groff do in this case?

Please give me an example where this is relevant to hyphenation.


    Werner

Re: [Groff] Installing Russian Type-1 Fonts

Reply via email to