2011/9/12 Mojca Miklavec <mojca.miklavec.li...@gmail.com>: > On Mon, Sep 12, 2011 at 09:36, Yves Codet wrote: >> Hello. >> >> A question to specialists, Arthur and Mojca maybe :) Is it necessary to have >> two sets of hyphenation rules, one in NFC and one in NFD? Or, if hyphenation >> patterns are written in NFC, for instance, will they be applied correctly to >> a document written in NFD? > > That depends on engine. > > >From what I understand, XeTeX does normalize the input, so NFD should > work fine. But I'm only speaking from memory based on Jonathan's talk > at BachoTeX. I might be wrong. I'm not sure what LuaTeX does. If one > doesn't write the code, it might be that no normalization will ever > take place. > I am not an expert on Unicode and do not know what XeTeX does and when. I made a test in Hindi when implementing sort rules in Xindy. What I am speaking about is sample 4 available from http://icebearsoft.euweb.cz/xindy-devanagari/ (this is what I presented last year in Brejlov). Hindi makes use of characters with nuktas. For instance, za can be entered as U+095B or as ja U+091C followed by nukta U+093C. The latter can be found in the wordlist used in aspell. In my sample the first page contains a few words where all "nukte vale" characters are written directly, on the second page the same words are written using nukta signs. The first index shows that the \index macro wrote the input without any change, I had to use merge rules in Xindy. I have not looked what was written to xdv and now gedit does some strange things...
> I can also easily imagine that our patterns don't work with NFD input > with Hyphenator.js. I'm not sure how patterns in Firefox or OpenOffice > deal with normalization. I never tested that. > > But in my opinion engine *should* be capable of doing normalization. > Else you can easily end up with exponential problem. A patterns with 3 > accented letters can easily result in 8 or even more duplicated > patterns to cover all possible combinations of composed-or-decomposed > characters. > > Arthur had some plans to cover normalization in hyph-utf8, but I > already hate the idea of duplicated apostrophe, let alone all > duplications just for the sake of "stupid engines that don't > understand unicode" :). > > Mojca > > > > -------------------------------------------------- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex