Hello!

I will try to gather some information about Japanese, Chinese and Korean support for Polyglossia in the next days.

Because I do not understand tex programming at all, I can only give some information here. I will try to write it as detailled as possible, so that the implementation should not be that hard :)


What I understand until now – what is possible, what is too different would be like this:

For every three languages:

1. Line spacing needs to be increased. All characters from these three scripts are written in a square, which would be like writing in capitals all the time in Latin fonts. Because of this, the line spacing would be too narrow with the default setting. I do not yet know how much the line spacing actually should be, but I will try to figure that out. Also, line spacing should be according to the text environment. If the default language of the document is some western text, the line spacing for e.g. \textkorean{} should not be increased. This is because one would use this option to enter some Korean text in a western text, where it is not desirable to increase the line spacing (you would not do that if you enter an abbreviation in all caps, either). If a CJK language is chosen with \setdefaultlanguage or \begin{korean}, the line spacing should be adjusted, though.

2. A date would be in this format: 2010 [word for year] 7 [word for month] 23 [word for month].
In Chinese and Japanese, this would be: 2010年7月23日
In Korean it would be 2010년7월23일

3. Chapternames etc. are written with the number between two words: ordinal prefix - number – “chapter”
e.g., “chapter 1”: 第1章 in Japanese or Chinese.

4. ”table of contents” etc. needs to be translated

-----

For Chinese and Japanese:

1. There are calendar systems in Japan and Taiwan, which count the year after the founding of the republic of China or after the current emperor. In Taiwan, one simply needs to substract 1911 and get the current year. Also, one needs to write 民國 (Mínguó = “Republic”) in front of the year.
E.g.: 2010-07-23 -> 民國99年7月23日
In Japan, the year is depending on the current emperor.
From 1868 to 1911: Substract 1867 and add a 明治 (Meiji) before the number.
e.g.: 1905 -> 明治38年
From 1912 to 1925: substract 1911, add 大正 (Taishō)
From 1926 to 1988: substract 1925, add 昭和 (Shōwa)
From 1989: substract 1988, add 平成 (Heisei)
if it is the first year of the emperor, don’t write 1年, but write 元年, e.g. 昭和元年. I think, only the last emperor, Heisei, is of practically relevance. It would be nice to include the other ones, though. Before 1868 it is too hard, because they still used the lunar calendar at that time. I think nobody needs a calculation for that, though.

2. Both languages still use Chinese numerals, although to a different kind of degree. They need to be converted from arabic digits. The method is different sometimes. For year numbers and page numbers (seldom): Just replace every arabic digit with the appropriate Chinese digit (一二三四五六七八九〇). E.g. page 354 = 三五四. Year 1980 = 一九八〇年. But: 民國九十八年 (十 = 10; not sure about this), not 民國九八年.
For other numbers: e.g. 1324 = 一千三百二十四

3. Another option: If arabic numbers are used, they may need to be converted to full width numbers. e.g. 3 = 3

--------

For Japanese:

1. kinsoku shori (line breaking rules). In Japanese, a line cannot be broken at every character (like it would be in Chinese). Some punctuation marks are prohibited to start or end a line (e.g. 。、「), just like in western languages. Also, some Kana are not allowed to start a line (ょ、-、っ etc.). There are different levels of strictness. Punctuation marks like 。 are never allowed to break, but for e.g. ょ, the situation may be different. There could e.g. be 3 levels of strictness: off (break everywhere), low (break everywhere except in front of 。 etc)., medium (don’t break in front of ょ, but in front of -), high (don’t break in front of ょ, - or any other similar character). Because Japanese is written without spaces, it can be a little bit difficult to achieve this effect. Characters like 。、 are just written at the end of the line, so that the line becomes a little bit longer. In other cases, it may be necessary to shorten or lengthen the spacing. Usually, the only place where this is possible is before/after 。、「 and similar characters. Also, in some fonts, the characters are not actually all the same size, so it may be possible to do that there (not sure about that).

For Chinese:

1. They still use the lunar calendar (I don’t yet quite understand the calculation). But this is very optional. I don’t think that this is ever used in academic writings. Even if, you could just write it by hand. Would be a nice feature, though.

2. Support for simplified and traditional Chinese is needed. This would change the translations of table of contents etc., and may also have some other, typographic effects.



Features, which may not be easily achieved:

1. Vertical writing. Absolutely necessary, but I think extremely hard. May need some drastically changes in xetex, if it should not be a dirty hack (“put every character in a box and then put all the boxes under each other”). Maybe not as necessary for academic writing, though. This depends on the subject. In subjects, where mathematics is used, vertical writing is not useful. But I think, it is still extensively used in subjects like history etc.

2. Ruby characters. They are also extremely necessary (for Japanese). They are smaller characters put on top (or below) of the Chinese character to indicate the reading. Basically, they are put between the lines (in the line spacing), with no change in the line spacing. There are different ways of ruby annotations, e.g. mono ruby (every character has its pronounciation), group ruby (a complete word, consisting of multiple Chinese characters, has the reading put on top). Also, the ruby character can overlap on the other characters next to the word (Ruby characters are printed at half the size of the base text, which gives every Chinese character room for two ruby characters. There may be words where the reading is longer than that, e.g. 承る with the ruby characters うけたまわ). It can also put a space between the word (in compounds. E.g. 躊躇 (ちゅうちょ) would be too long, so it may be stretched like 躊 躇.
In vertical writing, the ruby characters go on the right side of the line.

There are also ruby characters (Zhuyin Fuhao) in Taiwan, which is more complicated. In vertical writing, they are written like Japanese on the right side of the line. In horizontal writing, they are, unlike Japanese, written on the right side of the character. It is more difficult, because the characters forming a syllable themselves need to be stacked vertically, even in horizontal writing, but the tone mark goes on the right side of the sylabble. It may be better to let a Opentype font handle the composition of the sylabbles (for example via ligatures), because I guess that Xetex would not achieve a visually pleasing result. The problem is, that there are no opentype fonts who do that, as far as I know.

I think, there is a ruby package for the old cjk package, but I don’t know if that still works with Xetex.

3. Emphasis. There is no italic writing in Chinese characters. In Japanese, emphasis is done by putting 、 on top of every character (as a ruby character). This method is quite easily achieved if ruby characters are supported. I am not sure about Chinese, but I think they do that with a dot, similarly to Japanese.

4. Footnotes: In Japanese, they are also done like the emphasis mark, as a ruby character.



Ok, that is all which comes to my mind right now. I will gather more information.

I wonder if polyglossia is the right approach for everything? Of course, translations of “table of contents” and e.g. kinsoku shori are good for polyglossia, but what about ruby characters? I think, it may be nice to have a CJK package which offers support for vertical writing, ruby, maybe calculation of the calendars etc. They are extremely necessary for these languages, but may not be needed for other languages. Maybe it would be good if polyglossia loaded this package if it detects one of these three languages. This would then make it easy to actually use for example Japanese, because it is not necessary to know which packages you need to load.

e.g. just load polyglossia and set Japanese, and it will automatically load packages for vertical writing and ruby characters, without the need to load these packages on your own.

Because some of the most basic Latex features (like footnotes or emphasis) would require this special package, I think it would be best if polyglossia then also loads it. But I’m not sure if the design of polyglossia is like this.


Gerrit



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Reply via email to