Hi, At Sat, 21 Oct 2000 10:46:51 +0200 (CEST), Werner LEMBERG <[EMAIL PROTECTED]> wrote:
> In general. I want to define terms completely independent on any > particular program. We have > > character set > character encoding > glyph set > glyph encoding I understand. Since we are discussing on the preprocessor, let's concentrate on character, not glyph. I think you now will agree to specify the 'character set/encoding' by a single word such as 'EUC-JP' instead of a pair of 'JIS-X-0208' and 'EUC'. BTW, I am implementing the preprocessor. Now it has features of: - input from standard input (stdin) - output to standard output (stdout) - I18N directive to support locale-sensible mode - hard-coded converter from Latin1, EBCDIC, and UTF-8 to UTF-8 - locale-sensible converter from any encodings supported by OS to UTF-8 (note: UTF-8 has to be supported by iconv(3) ) - encoding for input is determined by command option or default - default is 'latin1' when compiled without I18N or locale-sensible when compiled with I18N However I have to implement - encoding has to be determined also by '-*- ... -*-' directive in the roff source - (I18N mode) encoding has to be able to be specified by MIME-style and Emacs-style names. - efficiency of memory and CPU usage is not considered yet. - input from files besides stdin I will send the source soon. > > roff source in any encoding like '\(co' (character) > > | > > | preprocessor > > V > > UTF-8 stream like u+00a9 (character) > > | > > | troff > > V > > glyph expression like 'co' (glyph) > > | > > | troff (continuing) > > V > > Here is missing a step: > > typeset output (glyph) > | > | grotty > V > > > UTF-8 stream like u+00a9 or '(C)' (character) > > | > > | postprocessor > > V > > formatted text in any encoding (character) I understand well. Thank you for your explanation. BTW, besides TTY output, HTML will need postprocess from glyph to character like 'grotty' in tty mode, since HTML is a text file. I think the encoding for HTML can be always UTF-8. We can add a line between <HEAD> and </HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8"> (I found a code in grohtml.cc to write this line without charset directive.) --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://surfchem0.riken.go.jp/~kubota/