Hi,
At Wed, 25 Oct 2000 10:09:34 +0200 (CEST),
Werner LEMBERG <[EMAIL PROTECTED]> wrote:
> Hmm. What about the following temporary solution:
>
> Run the preprocessor twice; the first time it is called directly by
> the groff program, and it returns an error code which groff can use
> to s
> > What mechanism do you suggest for communication between the
> > preprocessor and troff?
>
> Well, I thought again and conclude that the current version of Groff
> cannot cooperate well with the preprocessor. If we want
> locale-sensibility before the re-implementation of troff, I suggest
> t
Hi,
At Tue, 24 Oct 2000 10:32:13 +0200 (CEST),
Werner LEMBERG <[EMAIL PROTECTED]> wrote:
>> BTW, do you plan to release Groff with Japanese patch, with my
>> preprocessor, as a makeshift until Groff with UTF-8 input will be
>> available? (I thought so since you seem to be interested in my
>> pre
> The algorithm will be: check locale and use
> - -Tlatin1 for Latin-1 languages
> - -Tnippon for Japanese
> - -Tascii8 for other languages
> if groff wrapper is invoked with -Ttty. (IMO, we should not override
> user's specification of -Tlatin1, -Tascii, -Tnippon, and so on).
What mechanism
Hi,
At Mon, 23 Oct 2000 09:42:18 +0200 (CEST),
Werner LEMBERG <[EMAIL PROTECTED]> wrote:
> > - hard-coded converter from Latin1, EBCDIC, and UTF-8 to UTF-8
> > - locale-sensible converter from any encodings supported by OS to UTF-8
> >(note: UTF-8 has to be supported by iconv(3) )
>
> May
> I think you now will agree to specify the 'character set/encoding'
> by a single word such as 'EUC-JP' instead of a pair of 'JIS-X-0208'
> and 'EUC'.
Yes :-)
> BTW, I am implementing the preprocessor. Now it has features of:
> - hard-coded converter from Latin1, EBCDIC, and UTF-8 to UTF-8
>
Hi,
At Sat, 21 Oct 2000 10:46:51 +0200 (CEST),
Werner LEMBERG <[EMAIL PROTECTED]> wrote:
> In general. I want to define terms completely independent on any
> particular program. We have
>
> character set
> character encoding
> glyph set
> glyph encoding
I understand. Since we are dis
> 2. Perhaps it is a good point of view to see troff (gtroff) as an
> engine which handles _glyphs_, not characters, in a given context of
> typographic style and layout. The current glyph is defined by the
> current point size, the current font, and the name of the
> "character" which is to be re
(B> > Well, maybe. But sometimes there is kerning. Please consult Ken
(B> > Lunde's `CJKV Information Processing' for details. Example:
(B> >
(B> >$B!;(B
(B> >$B0l(B
(B> >$B!;(B
(B>
(B> Wow! This is the first time I received a Japanese mail from
(B> non-Japanese sp
> However, I am interested in how Groff 1.16 works for UTF-8 input.
> I could not find any code for UTF-8 input, though I found a code for
> UTF-8 output in src/devices/grotty/tty.cc . Am I missing something?
> (Of course /font/devutf8/* has no implementation of UTF-8 encoding,
> though it seems
Hi Werner (and all)
Thanks for this clarifying explanation. I have a couple of comments,
one explanatory, the other which, I think, may point to the core
of the question.
On 21-Oct-00 Werner LEMBERG wrote:
>> Troff's multi-character naming convention means that anything you
>> could possibly need
> 1. Your 'charset' and 'encoding' are for troff or for preprocessor?
In general. I want to define terms completely independent on any
particular program. We have
character set
character encoding
glyph set
glyph encoding
>I thought both of them are for preprocessor. The preproces
> A.1. At present troff accepts 8-bit input, i.e. recognises 256
> distinct entities in the input stream (with a small number of
> exceptions which are "illegal").
We need at least 20 bit (for Unicode BMP + surrogates) and the special
characters. A 32bit wide number is thus the right choice IMHO
> Would it be useful to add to the texinfo documentation a note
> explaining that `-a' should only be used for these situations?
I've added some words, thanks.
Werner
Hi,
At Sat, 21 Oct 2000 15:39:24 +0100 (BST),
(Ted Harding) <[EMAIL PROTECTED]> wrote:
> Someone writing a document about Middle Eastern and related literatures
> may wish to use the Arabic, Persian, Hebrew, Turkish (all of which have
> different scripts), and also various Central Asian languages
On 21-Oct-00 Tomohiro KUBOTA wrote:
> Hi,
>
> At Fri, 20 Oct 2000 20:32:17 +0100 (BST),
> (Ted Harding) <[EMAIL PROTECTED]> wrote:
>
>> B: Troff should be able to cope with multi-lingual documents, where
>> several different languages occur in the same document. I do NOT
>> believe that the right
Hi,
(B
(BAt Fri, 20 Oct 2000 14:14:44 +0200 (CEST),
(BWerner LEMBERG <[EMAIL PROTECTED]> wrote:
(B
(B> > I think *ideograms* have fixed width everywhere.
(B>
(B> Well, maybe. But sometimes there is kerning. Please consult Ken
(B> Lunde's `CJKV Information Processing' for details. Exampl
Hi,
At Fri, 20 Oct 2000 20:32:17 +0100 (BST),
(Ted Harding) <[EMAIL PROTECTED]> wrote:
> It does not really matter that these are interpreted, by default, as
> iso-latin-1. They could correspond to anything on your screen when you
> are typing, and you can set up translation macros in troff to ma
On 17-Oct-00 Werner LEMBERG wrote:
> Well, I insist that GNU troff doesn't support multi-byte encodings at
> all :-) troff itself should work on a glyph basis only. It has to
> work with *glyph names*, be it CJK entities or whatever. Currently,
> the conversion from input encoding to glyph entiti
Werner LEMBERG writes:
> The `-a' option is almost useless today IMHO. It will show a tty
> approximation of the typeset output:
>
> groff -a -man -Tdvi troff.man | less
>
> It is *not* the right way to quickly select an ASCII device. To
> override the used macros for the output character set
Hi,
At Fri, 20 Oct 2000 14:45:51 +0200 (CEST),
Werner LEMBERG <[EMAIL PROTECTED]> wrote:
> First of all: We both mean the same, and we agree how to handle the
> problem in groff. I'm only arguing about technical terms.
>
> Another try.
>
> Consider a PostScript font with its encoding vector.
(B> > The same exists for Japanese and Chinese, especially for vertical
(B> > writing.
(B>
(B> I think *ideograms* have fixed width everywhere.
(B
(BWell, maybe. But sometimes there is kerning. Please consult Ken
(BLunde's `CJKV Information Processing' for details. Example:
(B
(B $
> > This is not true. Encoding does *not* imply the character set.
> > You are talking about charset/encoding tags.
>
> Hmm, I cannot understand your idea...
>
> I intend to mean
> - character set: CCS (Coded Character Set) in RFC 2130
> - encoding: CES (Character Encoding Scheme) in RFC 2130
Hi,
At Thu, 19 Oct 2000 22:12:07 +0200 (CEST),
Werner LEMBERG <[EMAIL PROTECTED]> wrote:
> This is not true. Encoding does *not* imply the character set.
> You are talking about charset/encoding tags.
Hmm, I cannot understand your idea...
In Emacs, charsets such as ISO8859-1, JISX0208.1990, an
Hi,
At Thu, 19 Oct 2000 22:15:02 +0200 (CEST),
Werner LEMBERG <[EMAIL PROTECTED]> wrote:
> The same exists for Japanese and Chinese, especially for vertical
> writing.
I think *ideograms* have fixed width everywhere. Of course CJK languages
have their own non-letter symbols which sometimes don'
> Hanguls are usually treated as fixed-width, but it is not true. Many
> Korean wordprocessor supports proportional Hangul fonts, and even
> Korean Windows (9X/ME/2000) have proportional Hangul TrueType fonts
> in default (e.g. Gulim. Gulim has one more variation, Gulim-che; it
> is fixed-width f
> > Note that such an encoding request has to determine the encoding *and*
> > character set of a document (similar to Emacs).
> (snip)
> > Examples:
> > .\" -*- charset: JIS-X-0208; encoding: EUC -*-
> > .\" -*- charset: JIS-X-0208; encoding: ISO-2022 -*-
>
> No. only specifying 'encoding' i
Hi,
At Thu, 19 Oct 2000 10:40:35 +0200 (CEST),
Werner LEMBERG <[EMAIL PROTECTED]> wrote:
> Note that such an encoding request has to determine the encoding *and*
> character set of a document (similar to Emacs).
(snip)
> Examples:
> .\" -*- charset: JIS-X-0208; encoding: EUC -*-
> .\" -*- cha
> "TK" == Tomohiro KUBOTA <[EMAIL PROTECTED]> writes:
> Right. I think I've answered this problem in my last mail (regarding
> a `glyphclass' directive in font description files).
TK> Then all of these glyphs have to have the same width. Fortunately,
TK> CJK ideograms and
> JIS X 0213 has many characters which are also included in JIS X 0212.
> It is very confusing. I guess JIS people think JIS X 0212 is
> obsolete.
Basically, only Emacs supports JIS X 0212...
> A few characters in JIS X 0213 are not included in the present
> Unicode.
AFAIK, this will be fixed
Hi,
At Wed, 18 Oct 2000 16:54:53 +0200 (CEST),
Werner LEMBERG <[EMAIL PROTECTED]> wrote:
> > However, thank you for explaining glyph. I also understand you
> > understand problems on Japanese character codes well.
> Well, I'm the author of the CJK package for LaTeX, I've written a
> ttf2pk con
> However, thank you for explaining glyph. I also understand you
> understand problems on Japanese character codes well.
Well, I'm the author of the CJK package for LaTeX, I've written a
ttf2pk converter, and I'm a member of the FreeType core team :-)
> Note that CJK ideographs also has distin
> As regards line breaking algorithm, I think we need some more cflags,
> at least for Japanese. That is,
>
>- lines must not be broken before the character
>- lines must not be broken after the character
>
> These seems to be implemented as PRE_KINSOKU and POST_KINSOKU in
> jgroff, bu
Hi,
At Wed, 18 Oct 2000 00:46:46 +0200 (CEST),
Werner LEMBERG <[EMAIL PROTECTED]> wrote:
>> - GNU troff will support UTF-8 only. Thus, multibyte encodings
>>will be not supported. [Though UTF-8 is multibyte :-p ]
> This was a typo, sorry. I've meant that I don't want to support
> multiple
> > Well, I insist that GNU troff doesn't support multibyte enodings
> > at all :-) troff itself should work on a glyph basis only. It has
> > to work with *glyph names*, be it CJK entities or whatever.
> > Currently, the conversion from input encoding to glyph entities
> > and the further process
> "FU" == Fumitoshi UKAI <[EMAIL PROTECTED]> writes:
FU> At Tue, 17 Oct 2000 22:19:09 +0900,
FU> Tomohiro KUBOTA <[EMAIL PROTECTED]> wrote:
> Though you may already know, please note that
> - Japanese and Chinese text contains few whitespace characters.
> (Japanese
> "TK" == Tomohiro KUBOTA <[EMAIL PROTECTED]> writes:
> The other merit of wchar_t is user-friendliness. Once a user set
> LANG variable, every softwares work under the specified encoding.
> If not, you have to specify encodings for every software. We don't
> want to have
At Tue, 17 Oct 2000 22:19:09 +0900,
Tomohiro KUBOTA <[EMAIL PROTECTED]> wrote:
> Though you may already know, please note that
> - Japanese and Chinese text contains few whitespace characters.
>(Japanese and Chinese words are not separated by whitespace).
>Therefore, different line-breaki
> If you want English error messages but ISO-8859-1 you could use
> something like
>
> LANG=en.ISO8859-1
>
> or
>
> LANG=en LC_CTYPE=de_DE
>
> or
>
> LANG=de LC_MESSAGES=en
Thanks for the info. My point was that groff must have i18n support
fully functional without locales also.
Werner
Hi,
At Tue, 17 Oct 2000 09:20:37 +0200 (CEST),
Werner LEMBERG <[EMAIL PROTECTED]> wrote:
> Well, I insist that GNU troff doesn't support multibyte enodings at
> all :-) troff itself should work on a glyph basis only. It has to
> work with *glyph names*, be it CJK entities or whatever. Currently
Werner LEMBERG <[EMAIL PROTECTED]>:
> Believe me, most professional UNIX users in Germany don't have LANG
> set correctly (including me). For example, I don't like to see German
> error messages since I'm used to the English ones. In fact, I never
> got used to the German way of handling compute
> The merit of wchar_t is that: write once and work for every
> encodings, uncluding UTF-8. Otherwise, you have to write similar
> source codes many times for Latin-1, EBCDIC, UTF-8, and so on so on.
> Especially, I will insist that Groff should support EUC-* multibyte
> encodings for CJK language
> A small part of the source code of Groff related to I/O has to be
> encoding-sensible. This part can handle Latin-1, EBCDIC, and UTF-8.
> Additionally, if Groff is compiled within internationalized OS
> (i.e. setlocale(), iconv(), nl_langinfo(), and so on are available),
> the part also has loca
Additional comments on my 'compromise' idea.
> One compromise is that:
> - to use UCS-4 for internal processing, not wchar_t.
> - a small part of input and output to be encoding-sensible.
A small part of the source code of Groff related to I/O
has to be encoding-sensible. This part can handle
44 matches
Mail list logo