As far as I know, there are no kernings and ligatures in Chinese. All
Chinese characters are "independent" and of exactly the same width, so
it is OK to calculate the string length by simply counting the number
of characters.

This post might help for the Unicode ranges:
http://stackoverflow.com/questions/1366068/whats-the-complete-range-for-chinese-characters-in-unicode

One issue to keep in mind is that when you deal with a mixture of
Chinese and ASCII characters, different rules should be applied
depending on which characters are on the margin, e.g. suppose
"你好hello" reaches the margin, and you can break the Chinese phrase:

[...]你
好hello[...]

or break between Chinese and English:

[...]你好
hello[...]

or break after English:

[...]你好hello
[...]

but you cannot break the English word like

[...]你好he
llo[...]


Regards,
Yihui
--
Yihui Xie <xieyi...@gmail.com>
Phone: 206-667-4385 Web: http://yihui.name
Fred Hutchinson Cancer Research Center, Seattle


On Wed, Jul 24, 2013 at 6:33 AM, Jean-Marc Lasgouttes
<lasgout...@lyx.org> wrote:
> 24/07/2013 09:39, Lin Wei:
>
>> Sorry for late reply. I've been volunteer teaching in undeveloped areas
>> where I can only check my email remittently.
>> Not really more progress. I asked further question and got no
>> reply....So I kind of give up......
>
>
> Dear Lin Wei,
>
> I am sorry that I have not been as responsive as necessary. Actually, at the
> time I was still trying to understand how the row breaking algorithm works.
> Now I have partly rewritten is in branch features/str-metrics (method is now
> named breakRow), with the goal of computing metrics on who;e strings to
> avoid problems related to ligatures and kerning.
>
> If we forget about insets, the algorithm is just to collect characters until
> a space is found and possibly break the row at this point. If I understand
> correctly, for Japanese or Chinese one could just break the row as soon as a
> character goes beyond the margin. I am wary of computing width of strings in
> an iterative way (a, then ab, then abc, then abcd...). Is it OK in Chinese
> and Japanese to compute the string length as sum of character lengths? (that
> is, are there kernings and ligatures in these languages?)
>
> Another question is: how do we recognize Chinese and Japanese characters? I
> guess they live in particular Unicode ranges.
>
> If things are really complicated, we could choose to rely on Qt's
> QTextBoundaryFinder, although this might be more complicated.
>
> Hope this helps.
>
> JMarc
>

Reply via email to