Im glad to here that there is finaly some implementation of roman transliteration in the sanskrit hyphenation pattern. Keep up the good work!
While I was checking hyphen-sa.tex, I wondered two things (which are irrelevant to Dominik's problem): 1) I saw that that all diacritics used for IAST appear in the pattern, while some of them (for example ṛ and ṝ) are marked as "non standart transliteration". That is OK, insofar as IAST is not a standart in the official sense. But IAST is most commonly used and the "standart" transliteration of vocalic r in IAST is ṛ, not r̥. The latter belongs to the international standart transliteration of Indic scripts, defined as ISO 15919. So if ISO 15919 has to be taken into concern for the Sanskrit hyphenation pattern, it should be done so completly. Which means, that for example ṁ should also be added, and ṃ marked as "non standart transliteration", and so on. But I don't know how far one can go here. While IAST is meant exclusivly for Sanskrit-transliteration (I know that it's used for Pali also, but in a slightly different way), ISO 15919 contains far more diacritics, than are needed for the transliteration of Sanskrit. It's rather meant as a transliteration of many or most Indian languages. Should it be duplicated then in every hyphenation pattern of every language in question? 2) That might be a stupid question, but aren't hyphennation patterns for most Abugida-scripts more or less the same? That means the hyphennation is rather script dependend, than language dependend. Lots of hyphennation patterns have to be duplicated, if they are ordered by language. While one could have a hyphen-indic.tex instead. Have a nice weekend! Manuel 2010/11/21 Dominik Wujastyk <wujas...@gmail.com>: > That's extremely helpful! Thank you, Arthur. > > I've upped the first argument of hyphenmins to 2, which helps a lot for > romanisation, but may make the Nagari breaks more difficult. I suppose it's > not reasonable to assume that hyphenation parameters will be the same across > different scripts. > > Best, > Dominik > > > On 20 November 2010 22:12, Arthur Reutenauer > <arthur.reutena...@normalesup.org> wrote: >> >> > I'm really not sure what I'm getting as a result. It looks as if it's >> > roman >> > script being hyphenated as if it were Devanagari. The initial a- of >> > several >> > words, like arhasi, gets separated (a-rhasi), which might just about >> > look >> > okay in Nagari, but not in romanisation. Am I actually getting the right >> > thing >> >> You're indeed getting what the patterns say. From what I read in >> hyph-sa.tex, the patterns allow breaks after any vowel (but not inside >> diphthongs), and forbids them before final consonants or consonant >> clusters; and that's about it. It's certainly a debatable choice, but >> it does seem like the patterns really aim at mimicking the way (say) >> Sanskrit written using Devanagari is hyphenated. You would have to take >> this up with Yves. >> >> > Why do I have to pretend that this is Devanagari (\devanagarifont)? >> >> This is by design in polyglossia (see gloss-sanskrit.ldf). You would >> have to take this up with François. (And I'm the one responsible for >> integrating hyph-sa.tex into hyph-utf8. Why does it seem like there is >> a French mafia around Sanskrit support in XeTeX? ;-) >> >> Arthur >> >> >> -------------------------------------------------- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex > > > > > -------------------------------------------------- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > > -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex