[XeTeX] Japanese, Chinese, Korean support for Polyglossia

Gerrit Fri, 23 Jul 2010 08:17:45 -0700

Hello!

I will try to gather some information about Japanese, Chinese and Koreansupport for Polyglossia in the next days.

Because I do not understand tex programming at all, I can only give someinformation here. I will try to write it as detailled as possible, sothat the implementation should not be that hard :)

What I understand until now – what is possible, what is too differentwould be like this:


For every three languages:

1. Line spacing needs to be increased. All characters from these threescripts are written in a square, which would be like writing in capitalsall the time in Latin fonts. Because of this, the line spacing would betoo narrow with the default setting.I do not yet know how much the line spacing actually should be, but Iwill try to figure that out.Also, line spacing should be according to the text environment. If thedefault language of the document is some western text, the line spacingfor e.g. \textkorean{} should not be increased. This is because onewould use this option to enter some Korean text in a western text, whereit is not desirable to increase the line spacing (you would not do thatif you enter an abbreviation in all caps, either).If a CJK language is chosen with \setdefaultlanguage or \begin{korean},the line spacing should be adjusted, though.

2. A date would be in this format: 2010 [word for year] 7 [word formonth] 23 [word for month].

In Chinese and Japanese, this would be: 2010年7月23日
In Korean it would be 2010년7월23일

3. Chapternames etc. are written with the number between two words:ordinal prefix - number – “chapter”

e.g., “chapter 1”: 第1章 in Japanese or Chinese.

4. ”table of contents” etc. needs to be translated

-----

For Chinese and Japanese:

1. There are calendar systems in Japan and Taiwan, which count the yearafter the founding of the republic of China or after the current emperor.In Taiwan, one simply needs to substract 1911 and get the current year.Also, one needs to write 民國 (Mínguó = “Republic”) in front of the year.

E.g.: 2010-07-23 -> 民國99年7月23日
In Japan, the year is depending on the current emperor.
From 1868 to 1911: Substract 1867 and add a 明治 (Meiji) before the number.
e.g.: 1905 -> 明治38年
From 1912 to 1925: substract 1911, add 大正 (Taishō)
From 1926 to 1988: substract 1925, add 昭和 (Shōwa)
From 1989: substract 1988, add 平成 (Heisei)

if it is the first year of the emperor, don’t write 1年, but write 元年,e.g. 昭和元年.I think, only the last emperor, Heisei, is of practically relevance. Itwould be nice to include the other ones, though.Before 1868 it is too hard, because they still used the lunar calendarat that time. I think nobody needs a calculation for that, though.

2. Both languages still use Chinese numerals, although to a differentkind of degree.They need to be converted from arabic digits. The method is differentsometimes.For year numbers and page numbers (seldom): Just replace every arabicdigit with the appropriate Chinese digit (一二三四五六七八九〇). E.g.page 354 = 三五四. Year 1980 = 一九八〇年. But: 民國九十八年 (十 = 10;not sure about this), not 民國九八年.

For other numbers: e.g. 1324 = 一千三百二十四

3. Another option: If arabic numbers are used, they may need to beconverted to full width numbers. e.g. 3 = ３


--------

For Japanese:

1. kinsoku shori (line breaking rules). In Japanese, a line cannot bebroken at every character (like it would be in Chinese). Somepunctuation marks are prohibited to start or end a line (e.g. 。、「),just like in western languages. Also, some Kana are not allowed to starta line (ょ、－、っ etc.).There are different levels of strictness. Punctuation marks like 。 arenever allowed to break, but for e.g. ょ, the situation may be different.There could e.g. be 3 levels of strictness: off (break everywhere), low(break everywhere except in front of 。 etc)., medium (don’t break infront of ょ, but in front of －), high (don’t break in front of ょ, －or any other similar character).Because Japanese is written without spaces, it can be a little bitdifficult to achieve this effect. Characters like 。、 are just writtenat the end of the line, so that the line becomes a little bit longer. Inother cases, it may be necessary to shorten or lengthen the spacing.Usually, the only place where this is possible is before/after 。、「and similar characters. Also, in some fonts, the characters are notactually all the same size, so it may be possible to do that there (notsure about that).


For Chinese:

1. They still use the lunar calendar (I don’t yet quite understand thecalculation). But this is very optional. I don’t think that this is everused in academic writings. Even if, you could just write it by hand.Would be a nice feature, though.

2. Support for simplified and traditional Chinese is needed. This wouldchange the translations of table of contents etc., and may also havesome other, typographic effects.




Features, which may not be easily achieved:

1. Vertical writing. Absolutely necessary, but I think extremely hard.May need some drastically changes in xetex, if it should not be a dirtyhack (“put every character in a box and then put all the boxes undereach other”). Maybe not as necessary for academic writing, though. Thisdepends on the subject. In subjects, where mathematics is used, verticalwriting is not useful. But I think, it is still extensively used insubjects like history etc.

2. Ruby characters. They are also extremely necessary (for Japanese).They are smaller characters put on top (or below) of the Chinesecharacter to indicate the reading. Basically, they are put between thelines (in the line spacing), with no change in the line spacing. Thereare different ways of ruby annotations, e.g. mono ruby (every characterhas its pronounciation), group ruby (a complete word, consisting ofmultiple Chinese characters, has the reading put on top). Also, the rubycharacter can overlap on the other characters next to the word (Rubycharacters are printed at half the size of the base text, which givesevery Chinese character room for two ruby characters. There may be wordswhere the reading is longer than that, e.g. 承る with the rubycharacters うけたまわ). It can also put a space between the word (incompounds. E.g. 躊躇 (ちゅうちょ) would be too long, so it may bestretched like 躊躇.

In vertical writing, the ruby characters go on the right side of the line.

There are also ruby characters (Zhuyin Fuhao) in Taiwan, which is morecomplicated. In vertical writing, they are written like Japanese on theright side of the line. In horizontal writing, they are, unlikeJapanese, written on the right side of the character. It is moredifficult, because the characters forming a syllable themselves need tobe stacked vertically, even in horizontal writing, but the tone markgoes on the right side of the sylabble. It may be better to let aOpentype font handle the composition of the sylabbles (for example vialigatures), because I guess that Xetex would not achieve a visuallypleasing result. The problem is, that there are no opentype fonts who dothat, as far as I know.

I think, there is a ruby package for the old cjk package, but I don’tknow if that still works with Xetex.

3. Emphasis. There is no italic writing in Chinese characters. InJapanese, emphasis is done by putting 、 on top of every character (as aruby character). This method is quite easily achieved if ruby charactersare supported. I am not sure about Chinese, but I think they do thatwith a dot, similarly to Japanese.

4. Footnotes: In Japanese, they are also done like the emphasis mark, asa ruby character.

Ok, that is all which comes to my mind right now. I will gather moreinformation.

I wonder if polyglossia is the right approach for everything? Of course,translations of “table of contents” and e.g. kinsoku shori are good forpolyglossia, but what about ruby characters?I think, it may be nice to have a CJK package which offers support forvertical writing, ruby, maybe calculation of the calendars etc.They are extremely necessary for these languages, but may not be neededfor other languages. Maybe it would be good if polyglossia loaded thispackage if it detects one of these three languages. This would then makeit easy to actually use for example Japanese, because it is notnecessary to know which packages you need to load.

e.g. just load polyglossia and set Japanese, and it will automaticallyload packages for vertical writing and ruby characters, without the needto load these packages on your own.

Because some of the most basic Latex features (like footnotes oremphasis) would require this special package, I think it would be bestif polyglossia then also loads it. But I’m not sure if the design ofpolyglossia is like this.



Gerrit



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

[XeTeX] Japanese, Chinese, Korean support for Polyglossia

Reply via email to