> 1. Your 'charset' and 'encoding' are for troff or for preprocessor? In general. I want to define terms completely independent on any particular program. We have
character set character encoding glyph set glyph encoding > I thought both of them are for preprocessor. The preprocessor > figures out the way to convert the input to UTF-8 from the > information. A groff preprocessor will work as you have described. Under the assumption that you are talking about input characters, the term `encoding' indeed implies the character set(s). After some thinking I have to correct myself: It is better to say that `EUC' is an `encoding scheme' which describes which character ranges and how many bytes are used. Sorry for the confusion. > 2. Which will the pre/postprocessors handle, characters or glyphs? The preprocessor converts from characters to characters (i.e. to Unicode), grotty + postprocessor convert glyph names back to Unicode characters (using a hard-coded table), then from characters to characters. I don't know yet whether it makes sense to unify the latter two programs. > 3. Your 'charset' is for glyph and 'encoding' is for character? > I thought both of them are for character, since I thought both > of them are for preprocessor. My point was to make the distinction clear between `set' and `encoding'. Maybe it is only of academic interest, but it (hopefully) helps to clear up the used terms. > 4. I though we were discussing on (tags in roff souce for) > preprocessor. Is that right? Yes. > roff source in any encoding like '\(co' (character) > | > | preprocessor > V > UTF-8 stream like u+00a9 (character) > | > | troff > V > glyph expression like 'co' (glyph) > | > | troff (continuing) > V Here is missing a step: typeset output (glyph) | | grotty V > UTF-8 stream like u+00a9 or '(C)' (character) > | > | postprocessor > V > formatted text in any encoding (character) Werner