Re: [Groff] Re: groff: radical re-implementation

2000-10-25 Thread Tomohiro KUBOTA
Hi, At Wed, 25 Oct 2000 10:09:34 +0200 (CEST), Werner LEMBERG <[EMAIL PROTECTED]> wrote: > Hmm. What about the following temporary solution: > > Run the preprocessor twice; the first time it is called directly by > the groff program, and it returns an error code which groff can use > to s

Re: [Groff] Re: groff: radical re-implementation

2000-10-25 Thread Werner LEMBERG
> > What mechanism do you suggest for communication between the > > preprocessor and troff? > > Well, I thought again and conclude that the current version of Groff > cannot cooperate well with the preprocessor. If we want > locale-sensibility before the re-implementation of troff, I suggest > t

Re: [Groff] Re: groff: radical re-implementation

2000-10-24 Thread Tomohiro KUBOTA
Hi, At Tue, 24 Oct 2000 10:32:13 +0200 (CEST), Werner LEMBERG <[EMAIL PROTECTED]> wrote: >> BTW, do you plan to release Groff with Japanese patch, with my >> preprocessor, as a makeshift until Groff with UTF-8 input will be >> available? (I thought so since you seem to be interested in my >> pre

Re: [Groff] Re: groff: radical re-implementation

2000-10-24 Thread Werner LEMBERG
> The algorithm will be: check locale and use > - -Tlatin1 for Latin-1 languages > - -Tnippon for Japanese > - -Tascii8 for other languages > if groff wrapper is invoked with -Ttty. (IMO, we should not override > user's specification of -Tlatin1, -Tascii, -Tnippon, and so on). What mechanism

Re: [Groff] Re: groff: radical re-implementation

2000-10-24 Thread Tomohiro KUBOTA
Hi, At Mon, 23 Oct 2000 09:42:18 +0200 (CEST), Werner LEMBERG <[EMAIL PROTECTED]> wrote: > > - hard-coded converter from Latin1, EBCDIC, and UTF-8 to UTF-8 > > - locale-sensible converter from any encodings supported by OS to UTF-8 > >(note: UTF-8 has to be supported by iconv(3) ) > > May

Re: [Groff] Re: groff: radical re-implementation

2000-10-23 Thread Werner LEMBERG
> I think you now will agree to specify the 'character set/encoding' > by a single word such as 'EUC-JP' instead of a pair of 'JIS-X-0208' > and 'EUC'. Yes :-) > BTW, I am implementing the preprocessor. Now it has features of: > - hard-coded converter from Latin1, EBCDIC, and UTF-8 to UTF-8 >

Re: [Groff] Re: groff: radical re-implementation

2000-10-22 Thread Tomohiro KUBOTA
Hi, At Sat, 21 Oct 2000 10:46:51 +0200 (CEST), Werner LEMBERG <[EMAIL PROTECTED]> wrote: > In general. I want to define terms completely independent on any > particular program. We have > > character set > character encoding > glyph set > glyph encoding I understand. Since we are dis

Re: [Groff] Re: groff: radical re-implementation

2000-10-22 Thread Werner LEMBERG
> 2. Perhaps it is a good point of view to see troff (gtroff) as an > engine which handles _glyphs_, not characters, in a given context of > typographic style and layout. The current glyph is defined by the > current point size, the current font, and the name of the > "character" which is to be re

Re: [Groff] Re: groff: radical re-implementation

2000-10-22 Thread Werner LEMBERG
(B> > Well, maybe. But sometimes there is kerning. Please consult Ken (B> > Lunde's `CJKV Information Processing' for details. Example: (B> > (B> >$B!;(B (B> >$B0l(B (B> >$B!;(B (B> (B> Wow! This is the first time I received a Japanese mail from (B> non-Japanese sp

Re: [Groff] Re: groff: radical re-implementation

2000-10-22 Thread Werner LEMBERG
> However, I am interested in how Groff 1.16 works for UTF-8 input. > I could not find any code for UTF-8 input, though I found a code for > UTF-8 output in src/devices/grotty/tty.cc . Am I missing something? > (Of course /font/devutf8/* has no implementation of UTF-8 encoding, > though it seems

Re: [Groff] Re: groff: radical re-implementation

2000-10-21 Thread Ted Harding
Hi Werner (and all) Thanks for this clarifying explanation. I have a couple of comments, one explanatory, the other which, I think, may point to the core of the question. On 21-Oct-00 Werner LEMBERG wrote: >> Troff's multi-character naming convention means that anything you >> could possibly need

Re: [Groff] Re: groff: radical re-implementation

2000-10-21 Thread Werner LEMBERG
> 1. Your 'charset' and 'encoding' are for troff or for preprocessor? In general. I want to define terms completely independent on any particular program. We have character set character encoding glyph set glyph encoding >I thought both of them are for preprocessor. The preproces

Re: [Groff] Re: groff: radical re-implementation

2000-10-21 Thread Werner LEMBERG
> A.1. At present troff accepts 8-bit input, i.e. recognises 256 > distinct entities in the input stream (with a small number of > exceptions which are "illegal"). We need at least 20 bit (for Unicode BMP + surrogates) and the special characters. A 32bit wide number is thus the right choice IMHO

Re: [Groff] Re: groff: radical re-implementation

2000-10-21 Thread Werner LEMBERG
> Would it be useful to add to the texinfo documentation a note > explaining that `-a' should only be used for these situations? I've added some words, thanks. Werner

Re: [Groff] Re: groff: radical re-implementation

2000-10-21 Thread Tomohiro KUBOTA
Hi, At Sat, 21 Oct 2000 15:39:24 +0100 (BST), (Ted Harding) <[EMAIL PROTECTED]> wrote: > Someone writing a document about Middle Eastern and related literatures > may wish to use the Arabic, Persian, Hebrew, Turkish (all of which have > different scripts), and also various Central Asian languages

Re: [Groff] Re: groff: radical re-implementation

2000-10-21 Thread Ted Harding
On 21-Oct-00 Tomohiro KUBOTA wrote: > Hi, > > At Fri, 20 Oct 2000 20:32:17 +0100 (BST), > (Ted Harding) <[EMAIL PROTECTED]> wrote: > >> B: Troff should be able to cope with multi-lingual documents, where >> several different languages occur in the same document. I do NOT >> believe that the right

Re: [Groff] Re: groff: radical re-implementation

2000-10-21 Thread Tomohiro KUBOTA
Hi, (B (BAt Fri, 20 Oct 2000 14:14:44 +0200 (CEST), (BWerner LEMBERG <[EMAIL PROTECTED]> wrote: (B (B> > I think *ideograms* have fixed width everywhere. (B> (B> Well, maybe. But sometimes there is kerning. Please consult Ken (B> Lunde's `CJKV Information Processing' for details. Exampl

Re: [Groff] Re: groff: radical re-implementation

2000-10-21 Thread Tomohiro KUBOTA
Hi, At Fri, 20 Oct 2000 20:32:17 +0100 (BST), (Ted Harding) <[EMAIL PROTECTED]> wrote: > It does not really matter that these are interpreted, by default, as > iso-latin-1. They could correspond to anything on your screen when you > are typing, and you can set up translation macros in troff to ma

Re: [Groff] Re: groff: radical re-implementation

2000-10-20 Thread Ted Harding
On 17-Oct-00 Werner LEMBERG wrote: > Well, I insist that GNU troff doesn't support multi-byte encodings at > all :-) troff itself should work on a glyph basis only. It has to > work with *glyph names*, be it CJK entities or whatever. Currently, > the conversion from input encoding to glyph entiti

Re: [Groff] Re: groff: radical re-implementation

2000-10-20 Thread T. Kurt Bond
Werner LEMBERG writes: > The `-a' option is almost useless today IMHO. It will show a tty > approximation of the typeset output: > > groff -a -man -Tdvi troff.man | less > > It is *not* the right way to quickly select an ASCII device. To > override the used macros for the output character set

Re: [Groff] Re: groff: radical re-implementation

2000-10-20 Thread Tomohiro KUBOTA
Hi, At Fri, 20 Oct 2000 14:45:51 +0200 (CEST), Werner LEMBERG <[EMAIL PROTECTED]> wrote: > First of all: We both mean the same, and we agree how to handle the > problem in groff. I'm only arguing about technical terms. > > Another try. > > Consider a PostScript font with its encoding vector.

Re: [Groff] Re: groff: radical re-implementation

2000-10-20 Thread Werner LEMBERG
(B> > The same exists for Japanese and Chinese, especially for vertical (B> > writing. (B> (B> I think *ideograms* have fixed width everywhere. (B (BWell, maybe. But sometimes there is kerning. Please consult Ken (BLunde's `CJKV Information Processing' for details. Example: (B (B $

Re: [Groff] Re: groff: radical re-implementation

2000-10-20 Thread Werner LEMBERG
> > This is not true. Encoding does *not* imply the character set. > > You are talking about charset/encoding tags. > > Hmm, I cannot understand your idea... > > I intend to mean > - character set: CCS (Coded Character Set) in RFC 2130 > - encoding: CES (Character Encoding Scheme) in RFC 2130

Re: [Groff] Re: groff: radical re-implementation

2000-10-19 Thread Tomohiro KUBOTA
Hi, At Thu, 19 Oct 2000 22:12:07 +0200 (CEST), Werner LEMBERG <[EMAIL PROTECTED]> wrote: > This is not true. Encoding does *not* imply the character set. > You are talking about charset/encoding tags. Hmm, I cannot understand your idea... In Emacs, charsets such as ISO8859-1, JISX0208.1990, an

Re: [Groff] Re: groff: radical re-implementation

2000-10-19 Thread Tomohiro KUBOTA
Hi, At Thu, 19 Oct 2000 22:15:02 +0200 (CEST), Werner LEMBERG <[EMAIL PROTECTED]> wrote: > The same exists for Japanese and Chinese, especially for vertical > writing. I think *ideograms* have fixed width everywhere. Of course CJK languages have their own non-letter symbols which sometimes don'

Re: [Groff] Re: groff: radical re-implementation

2000-10-19 Thread Werner LEMBERG
> Hanguls are usually treated as fixed-width, but it is not true. Many > Korean wordprocessor supports proportional Hangul fonts, and even > Korean Windows (9X/ME/2000) have proportional Hangul TrueType fonts > in default (e.g. Gulim. Gulim has one more variation, Gulim-che; it > is fixed-width f

Re: [Groff] Re: groff: radical re-implementation

2000-10-19 Thread Werner LEMBERG
> > Note that such an encoding request has to determine the encoding *and* > > character set of a document (similar to Emacs). > (snip) > > Examples: > > .\" -*- charset: JIS-X-0208; encoding: EUC -*- > > .\" -*- charset: JIS-X-0208; encoding: ISO-2022 -*- > > No. only specifying 'encoding' i

Re: [Groff] Re: groff: radical re-implementation

2000-10-19 Thread Tomohiro KUBOTA
Hi, At Thu, 19 Oct 2000 10:40:35 +0200 (CEST), Werner LEMBERG <[EMAIL PROTECTED]> wrote: > Note that such an encoding request has to determine the encoding *and* > character set of a document (similar to Emacs). (snip) > Examples: > .\" -*- charset: JIS-X-0208; encoding: EUC -*- > .\" -*- cha

Re: [Groff] Re: groff: radical re-implementation

2000-10-19 Thread CHOI Junho
> "TK" == Tomohiro KUBOTA <[EMAIL PROTECTED]> writes: > Right. I think I've answered this problem in my last mail (regarding > a `glyphclass' directive in font description files). TK> Then all of these glyphs have to have the same width. Fortunately, TK> CJK ideograms and

Re: [Groff] Re: groff: radical re-implementation

2000-10-19 Thread Werner LEMBERG
> JIS X 0213 has many characters which are also included in JIS X 0212. > It is very confusing. I guess JIS people think JIS X 0212 is > obsolete. Basically, only Emacs supports JIS X 0212... > A few characters in JIS X 0213 are not included in the present > Unicode. AFAIK, this will be fixed

Re: [Groff] Re: groff: radical re-implementation

2000-10-19 Thread Tomohiro KUBOTA
Hi, At Wed, 18 Oct 2000 16:54:53 +0200 (CEST), Werner LEMBERG <[EMAIL PROTECTED]> wrote: > > However, thank you for explaining glyph. I also understand you > > understand problems on Japanese character codes well. > Well, I'm the author of the CJK package for LaTeX, I've written a > ttf2pk con

Re: [Groff] Re: groff: radical re-implementation

2000-10-18 Thread Werner LEMBERG
> However, thank you for explaining glyph. I also understand you > understand problems on Japanese character codes well. Well, I'm the author of the CJK package for LaTeX, I've written a ttf2pk converter, and I'm a member of the FreeType core team :-) > Note that CJK ideographs also has distin

Re: [Groff] Re: groff: radical re-implementation

2000-10-18 Thread Werner LEMBERG
> As regards line breaking algorithm, I think we need some more cflags, > at least for Japanese. That is, > >- lines must not be broken before the character >- lines must not be broken after the character > > These seems to be implemented as PRE_KINSOKU and POST_KINSOKU in > jgroff, bu

Re: [Groff] Re: groff: radical re-implementation

2000-10-17 Thread Tomohiro KUBOTA
Hi, At Wed, 18 Oct 2000 00:46:46 +0200 (CEST), Werner LEMBERG <[EMAIL PROTECTED]> wrote: >> - GNU troff will support UTF-8 only. Thus, multibyte encodings >>will be not supported. [Though UTF-8 is multibyte :-p ] > This was a typo, sorry. I've meant that I don't want to support > multiple

Re: [Groff] Re: groff: radical re-implementation

2000-10-17 Thread Werner LEMBERG
> > Well, I insist that GNU troff doesn't support multibyte enodings > > at all :-) troff itself should work on a glyph basis only. It has > > to work with *glyph names*, be it CJK entities or whatever. > > Currently, the conversion from input encoding to glyph entities > > and the further process

Re: [Groff] Re: groff: radical re-implementation

2000-10-17 Thread CHOI Junho
> "FU" == Fumitoshi UKAI <[EMAIL PROTECTED]> writes: FU> At Tue, 17 Oct 2000 22:19:09 +0900, FU> Tomohiro KUBOTA <[EMAIL PROTECTED]> wrote: > Though you may already know, please note that > - Japanese and Chinese text contains few whitespace characters. > (Japanese

Re: [Groff] Re: groff: radical re-implementation

2000-10-17 Thread CHOI Junho
> "TK" == Tomohiro KUBOTA <[EMAIL PROTECTED]> writes: > The other merit of wchar_t is user-friendliness. Once a user set > LANG variable, every softwares work under the specified encoding. > If not, you have to specify encodings for every software. We don't > want to have

Re: [Groff] Re: groff: radical re-implementation

2000-10-17 Thread Fumitoshi UKAI
At Tue, 17 Oct 2000 22:19:09 +0900, Tomohiro KUBOTA <[EMAIL PROTECTED]> wrote: > Though you may already know, please note that > - Japanese and Chinese text contains few whitespace characters. >(Japanese and Chinese words are not separated by whitespace). >Therefore, different line-breaki

Re: [Groff] Re: groff: radical re-implementation

2000-10-17 Thread Werner LEMBERG
> If you want English error messages but ISO-8859-1 you could use > something like > > LANG=en.ISO8859-1 > > or > > LANG=en LC_CTYPE=de_DE > > or > > LANG=de LC_MESSAGES=en Thanks for the info. My point was that groff must have i18n support fully functional without locales also. Werner

Re: [Groff] Re: groff: radical re-implementation

2000-10-17 Thread Tomohiro KUBOTA
Hi, At Tue, 17 Oct 2000 09:20:37 +0200 (CEST), Werner LEMBERG <[EMAIL PROTECTED]> wrote: > Well, I insist that GNU troff doesn't support multibyte enodings at > all :-) troff itself should work on a glyph basis only. It has to > work with *glyph names*, be it CJK entities or whatever. Currently

Re: [Groff] Re: groff: radical re-implementation

2000-10-17 Thread Edmund GRIMLEY EVANS
Werner LEMBERG <[EMAIL PROTECTED]>: > Believe me, most professional UNIX users in Germany don't have LANG > set correctly (including me). For example, I don't like to see German > error messages since I'm used to the English ones. In fact, I never > got used to the German way of handling compute

Re: [Groff] Re: groff: radical re-implementation

2000-10-17 Thread Werner LEMBERG
> The merit of wchar_t is that: write once and work for every > encodings, uncluding UTF-8. Otherwise, you have to write similar > source codes many times for Latin-1, EBCDIC, UTF-8, and so on so on. > Especially, I will insist that Groff should support EUC-* multibyte > encodings for CJK language

Re: [Groff] Re: groff: radical re-implementation

2000-10-17 Thread Werner LEMBERG
> A small part of the source code of Groff related to I/O has to be > encoding-sensible. This part can handle Latin-1, EBCDIC, and UTF-8. > Additionally, if Groff is compiled within internationalized OS > (i.e. setlocale(), iconv(), nl_langinfo(), and so on are available), > the part also has loca

Re: [Groff] Re: groff: radical re-implementation

2000-10-16 Thread Tomohiro KUBOTA
Additional comments on my 'compromise' idea. > One compromise is that: > - to use UCS-4 for internal processing, not wchar_t. > - a small part of input and output to be encoding-sensible. A small part of the source code of Groff related to I/O has to be encoding-sensible. This part can handle