> JIS X 0213 has many characters which are also included in JIS X 0212. > It is very confusing. I guess JIS people think JIS X 0212 is > obsolete.
Basically, only Emacs supports JIS X 0212... > A few characters in JIS X 0213 are not included in the present > Unicode. AFAIK, this will be fixed (or have already been fixed?) in the next Unicode release where more than 10000 CJK characters are added (in the surrogate area). > > > Then the 'font definition file' will be irrationally large. > > > > Right. I think I've answered this problem in my last mail (regarding > > a `glyphclass' directive in font description files). > > Then all of these glyphs have to have the same width. Why? It is intended that `glyphclass' can occur multiple times. Say, one glyphclass command for full-width glyphs, and another one for half-width glyphs. > > Indeed, the default behaviour should be that the preprocessor adds > > a > > > > .mso tmac.<charset> > > > > line or something similar to the document, but there must be a > > possibility to override it manually. > > Good idea. Thus '#ifdef I18N' part can be restricted in pre/post- > processors. Exactly. > Overriding? Well, the current Groff has '-a' option. I think this > can be used for this purpose. (Anyway, we can provide substitution > only for non-letter symbols like soft-hyphen, '(C)', circles, > squares, and so on. I think this is sufficient.) The `-a' option is almost useless today IMHO. It will show a tty approximation of the typeset output: groff -a -man -Tdvi troff.man | less It is *not* the right way to quickly select an ASCII device. To override the used macros for the output character set we need a new option. Using `-a' is comparable to dvi2tty or similar converters. > We have to think about uniting my idea on design of preprocessor > and Ukai's idea of '.encoding "encoding-name"' in roff source. > > - it is the preprocessor that handles the ".encoding" . > - priority is that > * --input-encodings wins. > * .encoding is next. > * then falling into the default (locale-sensible for i18n OS > and latin-1 for non-i18n OS). Exactly. Compare this to the Emacs model of `local variables'. Note that such an encoding request has to determine the encoding *and* character set of a document (similar to Emacs). I suggest that we don't use `.encoding' but -*- charset-encoding: xxx -*- in the first comment block (almost similar to Emacs). troff shouldn't notice encoding issues at all and just accept UTF-8. If really necessary, we can add two additional commands to select encoding and character separatedly: -*- charset: ...; encoding: ... -*- Examples: .\" -*- charset: JIS-X-0208; encoding: EUC -*- .\" -*- charset: JIS-X-0208; encoding: ISO-2022 -*- Werner