Hello all, I thought I had solved all encoding problems until I tried to export my documetns into the HTML format. It seems that my understanding of how groff maps input charactes into its internal charactes and then into output glyphs is incomplete. Below I have described what I was doing and what results I got.
I have a KOI8-R encoded file that has the following letters, in the hex notation: F0, C5, D2, D7, D9, CA I am using the koi8-r.tmac file, which maps these letters as follows: ---------------------------------- Char hex Char dec Mapped char ---------------------------------- F0 240 \[u041F] C5 197 \[u0435] D2 210 \[u0440] D7 215 \[u0432] D9 217 \[u044B] CA 202 \[u0439] ---------------------------------- The values in the third column match the Unicode codes for the corresponding letters of the Russian language. When I process this file using the follow- ing MSDOS batch script type %1 | groff -mkoi8-r -t -Thtml > %2 groff outputs six (one per each symbol) warning mes- sages of the form: stdin:1: warning: can't find special character '<SYMBOL>', Where <SYMBOL> sequentially assumes the following values: u041F, u0435, u0440, u0432, u044B, u0438_0306, which is exactly what the corresponding input char- acters map to except for the last one, which turned into a composite code for a reason unknown to me. The resulting html file looks quite correct and con- tains the following: <p>Первый</p> These decimal values correspond to the values of the internal characters in the table above. The -mkoi8-r does work correctly, as I have tested by removing it. Here's what I do not understand and I would appreci- ate your help with: 1. I tried to define glyphs for the characters reported in the abovementioned warnings, in the ...\font\devhtml\r file like this: u041F 24 0 0x041F, but this did not affect either the output or the warning messages. Aren't these warnings about missing glyphs in the font file? If they are, then why didn't my defining the glyphs for those characers work? 2. Why did the last warning mention the composite character u0438_0306 instead of the original u0439, to which it is mapped by the koi8-r.tmac file? 3. I saw the line "unicode" in the ...\font\devhtml\desc file, but the descrip- tion of the DESC format does not mention the possibility of such a line. What does it do? 4. How to set up groff to accept koi8-r-encoded files and output html pages a. with the same ecoding, b. with the UTF8 encoding? Thank you in advance, Anton