[XeTeX] Polyglossia: Support for romanization of CJK

Gerrit Wed, 15 Jun 2011 11:47:41 -0700

Hello again, everyone,

I am currently writing an article, in which I also have someromanization of Japanese. Until now, I have to define the hyphenationmanually, which I think is a little bit of a nuisance.

So I wonder if it is possible to include at least hyphenation forJapanese, Chinese and Korean? Full support of CJK scripts may be alittle bit in the future, but I think that at least hyphenation patternsshouldn’t be that hard, because the romanizations are quite regularly.Unfortunately, I don’t really have any idea how to do that, so wouldsomeone be willing to help me with it? I think, the basic rules would belike that (just some preliminary thoughts):


Japanese - Hepburn:

Syllable structure are always consonant-vowel or consonant-vowel-n.Sometimes, if there is a double consonant (e.g. “/asatte/”), hyphenationshould take place between the double consonant.


Chinese - Pinyin:

Syllables can end with a vowel (/lai/), n (/wan/) or ng (/zhong/). Somewords like /xian /cannot be hyphenated, in contrast to words like/Xi’an/. Maybe for that, we could just insert all syllables (about 200or so) in the hyphenation file. Maybe it is important that tone markshave to be ignored, so that /Zhōngwén /is treated the same as /Zhongwen/.


Korean:
No idea, actually. :(

For Chinese, it would also be nice to have some kind ofTone-marks-escaping. Either, for the ease of typing, do it automaticallywhen a syllable is followed by a number: Zhōngwén:\textchinese{Zhong1wen2}. Or, do it with some kind of escaping:\textchinese{\Zhong1\wen2} or something like that. Maybe the firstmethod would be nicer to type, but could be a nuisance if you want tomix numbers with text, although I think that this will not be the casethat often. For Wade-Giles, the same thing could be done for putting thetone numbers in a superscript (Chung¹-wen²). For that, I think thewriter has to chose the romanization system in advance.

What do you think about that? Currently, Polyglossia has a huge “hole”for CJK languages. Even if there is currently manpower lacking for nicefull support of the scripts themselves, I think romanization is neededas well (maybe even more). If we could start with at least hyphenationsupport for romanization, we could gradually improve support of theother features (spacing, word breaking rules for Japanese, ruby,vertical writing etc.) as well. I think, it is easier to start with somesmall, easy stuff, instead of the difficult features.

I think providing translations for table of contents and so on would beeasy as well, this could be the next step.


Gerrit


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

[XeTeX] Polyglossia: Support for romanization of CJK

Reply via email to