Re: [XeTeX] Uppercase in Armenian

Peter von Kaehne Sun, 01 May 2022 11:29:56 -0700

Generally speaking - and I am speaking here very much as a novice on LaTeX and 
without any knowledge of Armenian one nearly always does  better to decompose 
and normalise all UTF8 texts , then work on them and then only recompose. And I 
think here this approach would have worked out fine .


Peter

Sent from my phone. Please forgive misspellings and weird “corrections”

> On 1 May 2022, at 19:13, Zdenek Wagner <zdenek.wag...@gmail.com> wrote:
> 
> շնորհակալություն – thank you for confirmation. I believe that there
> are people who know how to fix it.
> 
> Zdeněk Wagner
> http://ttsm.icpf.cas.cz/team/wagner.shtml
> 
> ne 1. 5. 2022 v 19:06 odesílatel DALALYAN Arnak
> <arnak.dalal...@ensae.fr> napsal:
>> 
>> Dear All,
>> 
>> I confirm that there are two correct uppercase versions of և, the reformed 
>> spelling is ԵՎ, whereas the classical spelling is ԵՒ.  Note that it has 
>> nothing to do with eastern or western Armenians, both versions of Armenian 
>> may use both versions of spelling. But the official language in Armenia is 
>> the eastern Armenian and the official spelling is the reformed one. 
>> Therefore, I believe a good way of operating for the uppercase command would 
>> be to output ԵՎ in the default regime, but to have an option "classical" for 
>> outputting  ԵՒ if
>> that option is activated.
>> 
>> Just to dive a bit deeper in this topic, it is true that և was originally a 
>> ligature but now it is a full letter in the reformed spelling.
>> 
>> The aim of my response was to confirm what was already more or less 
>> mentioned in Zdanek's messages. But I fear I can't help with fixing what 
>> latex is doing now.
>> 
>> Best regards,
>> Arnak
>> ________________________________________
>> From: Jonathan Kew [jfkth...@gmail.com]
>> Sent: Sunday, May 1, 2022 2:10 PM
>> To: XeTeX (Unicode-based TeX) discussion.; Zdenek Wagner
>> Cc: serguei.dach...@math.univ-bpclermont.fr; DALALYAN Arnak; 
>> vakop...@yahoo.com
>> Subject: Re: [XeTeX] Uppercase in Armenian
>> 
>> Hi Zdeněk,
>> 
>> Checking the Unicode character database[1], U+0587 is listed as having a
>> *compatibility* decomposition to <0565,0582> (not 0587):
>> 
>> 0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L;<compat> 0565 0582;;;;N;;;;;
>> 
>> Likewise, the SpecialCasing.txt file[2] that defines case mappings other
>> than simple 1:1 substitutions shows the same decomposition for the
>> uppercase form:
>> 
>> 0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN
>> 
>> So if I understand correctly, what \text_uppercase:n is doing is simply
>> implementing what the Unicode standard defines.
>> 
>> If this isn't the appropriate behavior, at least for some locales, I
>> believe that will need custom programming at some level, but I don't
>> know enough about it to get into any details.
>> 
>> As for whether xelatex (or other engines) form a ligature from one (or
>> other) of the decomposed sequences, that would be entirely in the hands
>> of the font developer. I guess such ligatures are not implemented widely
>> (if at all).
>> 
>> JK
>> 
>> [1] https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
>> [2] https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt
>> 
>>> On 01/05/2022 12:50, Zdenek Wagner wrote:
>>> Hi David,
>>> 
>>> when trying to explain it in a greater detail I found that the situation
>>> is even more complex. As I wrote, I follow Elena Yerevan on youtube and
>>> facebook so all what I know, I know from her videos, from her name
>>> written in both alphabets, from Wikipedia and from
>>> https://omniglot.com/writing/armenian.htm
>>> <https://omniglot.com/writing/armenian.htm> which means that I know
>>> generally nothing. We need clarification from people who know Armenian
>>> (հայերէն and/or հայերեն), therefore I am sending Cc to Arthur and the
>>> authors of te ArmTeX project (hopefully at least one of the addresses
>>> still exists).
>>> 
>>> I will start with the typical use case. The title of a chapter in the
>>> book class is written in lowercase and displayed that way in the chapter
>>> title as well as in the table of contents but appears in uppercase in
>>> the running head. This is why it should work.
>>> 
>>> The case of ligatures is different. My fonts have not only ff, fl, and
>>> fi ligatures but even ffi and ffl. If I find a word "difficult" on a web
>>> using a serif font, I see the ffi ligature but the source shows that it
>>> has the individual characters f, f, i and the ligature was created by
>>> the shaping engine. If I copy it and paste into a text editor such as
>>> vim or notepad, I will get the three characters. If I use it as a TeX
>>> source and typeset it withComputer Modern or Latin Modern, I will get
>>> the ffi ligature and \uppercase will work. If I copy U+0587 from a web
>>> page and copy it to a text editor, I will get U+0587. I tried both
>>> U+0565 U+0582 (եւ) and U+0565 U+057E (եվ) but non of them form the
>>> U+0587 (և) ligature in XeLaTeX. I did not understand why the ligature is
>>> considered ECH and YIWN but it seems that it is more historical and
>>> bound to the shape. If I understand it well, sun is pronounced in
>>> Armenian as "arew" but արև (U+0561 U+0580 U+0587) is the Eastern
>>> spelling but արեւ (U+0561 U+0580 U+0565 U+0582) is the classical
>>> spelling (as given in Wictionary) and probably also in the Western
>>> variant. As you can see on Omniglot, the Armenian names of
>>> Eastern/Western Armenian start with "arew" with these two spellings.
>>> Even "hayeren" (Armenian) has different spelling in the Eastern/Western
>>> variants (I have included both at the beginning of this mail). Having
>>> found the informatin on variants I saw that polyglossia supports
>>> variant=western. I tried to specify variant=eastern but it did not help.
>>> If you look at ot6enc.def, it defines uppercase variants at the end of
>>> the file where the uppercase version of \armew is \Arm@yechvev which is
>>> \Armyech\Armvev. I cannot try because I do not know the transliteration
>>> but just from the names of the characters it seems to me that it works
>>> correctly while \text_uppercase:n does not. It should know that U+0587
>>> shiould be decomposed to U+0565 U+057E (not U+0582) and then uppercase
>>> it to U+0535 U+054E (not U+0552), at least for the Eastern variant. I am
>>> not sure whether there are other issues and where exactly to fix it.
>>> 
>>> Zdeněk Wagner
>>> http://ttsm.icpf.cas.cz/team/wagner.shtml
>>> <http://ttsm.icpf.cas.cz/team/wagner.shtml>
>>> 
>

Re: [XeTeX] Uppercase in Armenian

Reply via email to