Re: [XeTeX] Uppercase in Armenian

2022-05-01 Thread Zdenek Wagner
Hi David,

when trying to explain it in a greater detail I found that the situation is
even more complex. As I wrote, I follow Elena Yerevan on youtube and
facebook so all what I know, I know from her videos, from her name written
in both alphabets, from Wikipedia and from
https://omniglot.com/writing/armenian.htm which means that I know generally
nothing. We need clarification from people who know Armenian (հայերէն
and/or հայերեն), therefore I am sending Cc to Arthur and the authors of te
ArmTeX project (hopefully at least one of the addresses still exists).

I will start with the typical use case. The title of a chapter in the book
class is written in lowercase and displayed that way in the chapter title
as well as in the table of contents but appears in uppercase in the running
head. This is why it should work.

The case of ligatures is different. My fonts have not only ff, fl, and fi
ligatures but even ffi and ffl. If I find a word "difficult" on a web using
a serif font, I see the ffi ligature but the source shows that it has the
individual characters f, f, i and the ligature was created by the shaping
engine. If I copy it and paste into a text editor such as vim or notepad, I
will get the three characters. If I use it as a TeX source and typeset it
withComputer Modern or Latin Modern, I will get the ffi ligature and
\uppercase will work. If I copy U+0587 from a web page and copy it to a
text editor, I will get U+0587. I tried both U+0565 U+0582 (եւ) and U+0565
U+057E (եվ) but non of them form the U+0587 (և) ligature in XeLaTeX. I did
not understand why the ligature is considered ECH and YIWN but it seems
that it is more historical and bound to the shape. If I understand it well,
sun is pronounced in Armenian as "arew" but արև (U+0561 U+0580 U+0587) is
the Eastern spelling but արեւ (U+0561 U+0580 U+0565 U+0582) is the
classical spelling (as given in Wictionary) and probably also in the
Western variant. As you can see on Omniglot, the Armenian names of
Eastern/Western Armenian start with "arew" with these two spellings. Even
"hayeren" (Armenian) has different spelling in the Eastern/Western variants
(I have included both at the beginning of this mail). Having found the
informatin on variants I saw that polyglossia supports variant=western. I
tried to specify variant=eastern but it did not help. If you look at
ot6enc.def, it defines uppercase variants at the end of the file where the
uppercase version of \armew is \Arm@yechvev which is \Armyech\Armvev. I
cannot try because I do not know the transliteration but just from the
names of the characters it seems to me that it works correctly while
\text_uppercase:n does not. It should know that U+0587 shiould be
decomposed to U+0565 U+057E (not U+0582) and then uppercase it to U+0535
U+054E (not U+0552), at least for the Eastern variant. I am not sure
whether there are other issues and where exactly to fix it.

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml


ne 1. 5. 2022 v 8:08 odesílatel David Carlisle 
napsal:

> the input uses a ligature character which has no corresponding uppercase,
> you need to decompose the ligature before uppercasing, which \uppercase
> can't do, you see the same in Latin script ff ligature
>
> \uppercase{diff diff}
>
> looks like DIFF DIff as the second one uses U+FB00 which has no uppercase.
>
>  U+0587 ARMENIAN SMALL LIGATURE ECH YIWN
>
> would be better input as  U+0565 U+0582
>
> David
>
>
>
>
> On Sun, 1 May 2022 at 00:09, Zdenek Wagner 
> wrote:
>
>> Yes, it looks better but the uppercase version should contain ԵՎ, not ԵՒ.
>> Վ is capital vew (U+54E) while Ւ is capital yiwn (U+552).
>>
>> Zdeněk Wagner
>> http://ttsm.icpf.cas.cz/team/wagner.shtml
>>
>>
>> ne 1. 5. 2022 v 0:53 odesílatel David Carlisle 
>> napsal:
>>
>>> Something like this, I think.
>>>
>>> [image: image.png]
>>>
>>> \documentclass{article}
>>> \usepackage{polyglossia}
>>>
>>> \setdefaultlanguage{armenian}
>>> \setmainfont{DejaVu Sans}
>>> \ExplSyntaxOn
>>> \let\tuppercase\text_uppercase:n
>>> \ExplSyntaxOff
>>> \pagestyle{empty}
>>> \begin{document}
>>> Երևան $\rightarrow$ \uppercase{Երևան}
>>>
>>> Երևան $\rightarrow$ \tuppercase{Երևան}
>>>
>>> \end{document}
>>>
>>> David
>>>
>>>
>>> On Sat, 30 Apr 2022 at 22:15, Zdenek Wagner 
>>> wrote:
>>>
 Hi all,

 first I should mention that I do not know Armenian at all and can just
 recognize a few characters. Anyway I came across a problem which
 probably cannot be solved by the standard \lccode / \uccode method.
 What I mean is Yerevan which is written as Երևան but YEREVAN (all
 caps) is ԵՐԵՎԱՆ because և has no uppercase variant and must be
 replaced by two characters ԵՎ. At least, it is visible in Elena
 Yerevan's songs shot in the city of Yerevan. The following file

 \documentclass{article}
 \usepackage{polyglossia}
 \setdefaultlanguage{armenian}
 \setmainfont{DejaVu Sans}
 \pagestyle{empty}
 \begin{document}
 Երևան $\rightarrow$ \uppe

Re: [XeTeX] Uppercase in Armenian

2022-05-01 Thread Jonathan Kew

Hi Zdeněk,

Checking the Unicode character database[1], U+0587 is listed as having a 
*compatibility* decomposition to <0565,0582> (not 0587):


0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L; 0565 0582N;

Likewise, the SpecialCasing.txt file[2] that defines case mappings other 
than simple 1:1 substitutions shows the same decomposition for the 
uppercase form:


0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN

So if I understand correctly, what \text_uppercase:n is doing is simply 
implementing what the Unicode standard defines.


If this isn't the appropriate behavior, at least for some locales, I 
believe that will need custom programming at some level, but I don't 
know enough about it to get into any details.


As for whether xelatex (or other engines) form a ligature from one (or 
other) of the decomposed sequences, that would be entirely in the hands 
of the font developer. I guess such ligatures are not implemented widely 
(if at all).


JK

[1] https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
[2] https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt

On 01/05/2022 12:50, Zdenek Wagner wrote:

Hi David,

when trying to explain it in a greater detail I found that the situation 
is even more complex. As I wrote, I follow Elena Yerevan on youtube and 
facebook so all what I know, I know from her videos, from her name 
written in both alphabets, from Wikipedia and from 
https://omniglot.com/writing/armenian.htm 
 which means that I know 
generally nothing. We need clarification from people who know Armenian 
(հայերէն and/or հայերեն), therefore I am sending Cc to Arthur and the 
authors of te ArmTeX project (hopefully at least one of the addresses 
still exists).


I will start with the typical use case. The title of a chapter in the 
book class is written in lowercase and displayed that way in the chapter 
title as well as in the table of contents but appears in uppercase in 
the running head. This is why it should work.


The case of ligatures is different. My fonts have not only ff, fl, and 
fi ligatures but even ffi and ffl. If I find a word "difficult" on a web 
using a serif font, I see the ffi ligature but the source shows that it 
has the individual characters f, f, i and the ligature was created by 
the shaping engine. If I copy it and paste into a text editor such as 
vim or notepad, I will get the three characters. If I use it as a TeX 
source and typeset it withComputer Modern or Latin Modern, I will get 
the ffi ligature and \uppercase will work. If I copy U+0587 from a web 
page and copy it to a text editor, I will get U+0587. I tried both 
U+0565 U+0582 (եւ) and U+0565 U+057E (եվ) but non of them form the 
U+0587 (և) ligature in XeLaTeX. I did not understand why the ligature is 
considered ECH and YIWN but it seems that it is more historical and 
bound to the shape. If I understand it well, sun is pronounced in 
Armenian as "arew" but արև (U+0561 U+0580 U+0587) is the Eastern 
spelling but արեւ (U+0561 U+0580 U+0565 U+0582) is the classical 
spelling (as given in Wictionary) and probably also in the Western 
variant. As you can see on Omniglot, the Armenian names of 
Eastern/Western Armenian start with "arew" with these two spellings. 
Even "hayeren" (Armenian) has different spelling in the Eastern/Western 
variants (I have included both at the beginning of this mail). Having 
found the informatin on variants I saw that polyglossia supports 
variant=western. I tried to specify variant=eastern but it did not help. 
If you look at ot6enc.def, it defines uppercase variants at the end of 
the file where the uppercase version of \armew is \Arm@yechvev which is 
\Armyech\Armvev. I cannot try because I do not know the transliteration 
but just from the names of the characters it seems to me that it works 
correctly while \text_uppercase:n does not. It should know that U+0587 
shiould be decomposed to U+0565 U+057E (not U+0582) and then uppercase 
it to U+0535 U+054E (not U+0552), at least for the Eastern variant. I am 
not sure whether there are other issues and where exactly to fix it.


Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml 





Re: [XeTeX] Uppercase in Armenian

2022-05-01 Thread Joseph Wright

On 01/05/2022 13:10, Jonathan Kew wrote:

Hi Zdeněk,

Checking the Unicode character database[1], U+0587 is listed as having a 
*compatibility* decomposition to <0565,0582> (not 0587):


0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L; 0565 0582N;

Likewise, the SpecialCasing.txt file[2] that defines case mappings other 
than simple 1:1 substitutions shows the same decomposition for the 
uppercase form:


0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN

So if I understand correctly, what \text_uppercase:n is doing is simply 
implementing what the Unicode standard defines.


If this isn't the appropriate behavior, at least for some locales, I 
believe that will need custom programming at some level, but I don't 
know enough about it to get into any details.


Indeed: we will add support for alternative casing for Arminian to 
\text_uppercase:nn shortly.


Joseph


Re: [XeTeX] Uppercase in Armenian

2022-05-01 Thread Zdenek Wagner
Hi all,

as I wrote, my knowledge is based just on facebook and youtube and
texts and videos on Omniglot. I can send you exact links to youtube
videos but you should be fast, it is displayed for a second or two. I
think that the Eastern variant (Արևելահայերեն) is used in Armenia and
the decomposition of 0587 should be <0565, 057E>. I know nothing about
other variants.

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml

ne 1. 5. 2022 v 14:17 odesílatel Joseph Wright
 napsal:
>
> On 01/05/2022 13:10, Jonathan Kew wrote:
> > Hi Zdeněk,
> >
> > Checking the Unicode character database[1], U+0587 is listed as having a
> > *compatibility* decomposition to <0565,0582> (not 0587):
> >
> > 0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L; 0565 0582N;
> >
> > Likewise, the SpecialCasing.txt file[2] that defines case mappings other
> > than simple 1:1 substitutions shows the same decomposition for the
> > uppercase form:
> >
> > 0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN
> >
> > So if I understand correctly, what \text_uppercase:n is doing is simply
> > implementing what the Unicode standard defines.
> >
> > If this isn't the appropriate behavior, at least for some locales, I
> > believe that will need custom programming at some level, but I don't
> > know enough about it to get into any details.
>
> Indeed: we will add support for alternative casing for Arminian to
> \text_uppercase:nn shortly.
>
> Joseph



Re: [XeTeX] Uppercase in Armenian

2022-05-01 Thread Philip Taylor (Hellenic Institute)

On 01/05/2022 13:27, Zdenek Wagner wrote:

Hi all,

as I wrote, my knowledge is based just on facebook and youtube and
texts and videos on Omniglot. I can send you exact links to youtube
videos but you should be fast, it is displayed for a second or two.


Youtube URLs can include a timestamp, so provided that autoplay is 
disabled in the player anything displayed at that time should remain 
on-screen indefinitely.


Example: https://www.youtube.com/watch?v=5A3NYsUCne4&t=9s

--
/** Phil./



Re: [XeTeX] Uppercase in Armenian

2022-05-01 Thread Zdenek Wagner
OK, but I have to open it twice. First it jumps into advertisement and
I have to play it and close it. The second attempt jumps to the
advertisement again but then it goes to the timestamp and stays there.

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml

ne 1. 5. 2022 v 15:02 odesílatel Philip Taylor (Hellenic Institute)
 napsal:
>
> On 01/05/2022 13:27, Zdenek Wagner wrote:
>
> Hi all,
>
> as I wrote, my knowledge is based just on facebook and youtube and
> texts and videos on Omniglot. I can send you exact links to youtube
> videos but you should be fast, it is displayed for a second or two.
>
> Youtube URLs can include a timestamp, so provided that autoplay is disabled 
> in the player anything displayed at that time should remain on-screen 
> indefinitely.
>
> Example: https://www.youtube.com/watch?v=5A3NYsUCne4&t=9s
>
> --
> ** Phil.



Re: [XeTeX] Uppercase in Armenian

2022-05-01 Thread Zdenek Wagner
Hi all,

I did not manage to create the link with a timestamp but I found the
Coat of arms of Yerevan:
https://commons.wikimedia.org/wiki/File:Coat_of_%D0%B0rms_of_Yerevan.svg

The text on it is calligraphic but you should recognize 0535 0550 0535
054E 0531 0546. If you unfold the names in other languages, you will
see "Yerevani" where -i is the ending to form an adjective (the same
ending is used in Persian and Hindi). If you go down to the usage, the
first link is Yerevan and you should recognize 0535 0580 0587 0561
0576.

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml

ne 1. 5. 2022 v 14:27 odesílatel Zdenek Wagner  napsal:
>
> Hi all,
>
> as I wrote, my knowledge is based just on facebook and youtube and
> texts and videos on Omniglot. I can send you exact links to youtube
> videos but you should be fast, it is displayed for a second or two. I
> think that the Eastern variant (Արևելահայերեն) is used in Armenia and
> the decomposition of 0587 should be <0565, 057E>. I know nothing about
> other variants.
>
> Zdeněk Wagner
> http://ttsm.icpf.cas.cz/team/wagner.shtml
>
> ne 1. 5. 2022 v 14:17 odesílatel Joseph Wright
>  napsal:
> >
> > On 01/05/2022 13:10, Jonathan Kew wrote:
> > > Hi Zdeněk,
> > >
> > > Checking the Unicode character database[1], U+0587 is listed as having a
> > > *compatibility* decomposition to <0565,0582> (not 0587):
> > >
> > > 0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L; 0565 0582N;
> > >
> > > Likewise, the SpecialCasing.txt file[2] that defines case mappings other
> > > than simple 1:1 substitutions shows the same decomposition for the
> > > uppercase form:
> > >
> > > 0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN
> > >
> > > So if I understand correctly, what \text_uppercase:n is doing is simply
> > > implementing what the Unicode standard defines.
> > >
> > > If this isn't the appropriate behavior, at least for some locales, I
> > > believe that will need custom programming at some level, but I don't
> > > know enough about it to get into any details.
> >
> > Indeed: we will add support for alternative casing for Arminian to
> > \text_uppercase:nn shortly.
> >
> > Joseph



Re: [XeTeX] Uppercase in Armenian

2022-05-01 Thread Zdenek Wagner
շնորհակալություն – thank you for confirmation. I believe that there
are people who know how to fix it.

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml

ne 1. 5. 2022 v 19:06 odesílatel DALALYAN Arnak
 napsal:
>
> Dear All,
>
> I confirm that there are two correct uppercase versions of և, the reformed 
> spelling is ԵՎ, whereas the classical spelling is ԵՒ.  Note that it has 
> nothing to do with eastern or western Armenians, both versions of Armenian 
> may use both versions of spelling. But the official language in Armenia is 
> the eastern Armenian and the official spelling is the reformed one. 
> Therefore, I believe a good way of operating for the uppercase command would 
> be to output ԵՎ in the default regime, but to have an option "classical" for 
> outputting  ԵՒ if
> that option is activated.
>
> Just to dive a bit deeper in this topic, it is true that և was originally a 
> ligature but now it is a full letter in the reformed spelling.
>
> The aim of my response was to confirm what was already more or less mentioned 
> in Zdanek's messages. But I fear I can't help with fixing what latex is doing 
> now.
>
> Best regards,
> Arnak
> 
> From: Jonathan Kew [jfkth...@gmail.com]
> Sent: Sunday, May 1, 2022 2:10 PM
> To: XeTeX (Unicode-based TeX) discussion.; Zdenek Wagner
> Cc: serguei.dach...@math.univ-bpclermont.fr; DALALYAN Arnak; 
> vakop...@yahoo.com
> Subject: Re: [XeTeX] Uppercase in Armenian
>
> Hi Zdeněk,
>
> Checking the Unicode character database[1], U+0587 is listed as having a
> *compatibility* decomposition to <0565,0582> (not 0587):
>
> 0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L; 0565 0582N;
>
> Likewise, the SpecialCasing.txt file[2] that defines case mappings other
> than simple 1:1 substitutions shows the same decomposition for the
> uppercase form:
>
> 0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN
>
> So if I understand correctly, what \text_uppercase:n is doing is simply
> implementing what the Unicode standard defines.
>
> If this isn't the appropriate behavior, at least for some locales, I
> believe that will need custom programming at some level, but I don't
> know enough about it to get into any details.
>
> As for whether xelatex (or other engines) form a ligature from one (or
> other) of the decomposed sequences, that would be entirely in the hands
> of the font developer. I guess such ligatures are not implemented widely
> (if at all).
>
> JK
>
> [1] https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
> [2] https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt
>
> On 01/05/2022 12:50, Zdenek Wagner wrote:
> > Hi David,
> >
> > when trying to explain it in a greater detail I found that the situation
> > is even more complex. As I wrote, I follow Elena Yerevan on youtube and
> > facebook so all what I know, I know from her videos, from her name
> > written in both alphabets, from Wikipedia and from
> > https://omniglot.com/writing/armenian.htm
> >  which means that I know
> > generally nothing. We need clarification from people who know Armenian
> > (հայերէն and/or հայերեն), therefore I am sending Cc to Arthur and the
> > authors of te ArmTeX project (hopefully at least one of the addresses
> > still exists).
> >
> > I will start with the typical use case. The title of a chapter in the
> > book class is written in lowercase and displayed that way in the chapter
> > title as well as in the table of contents but appears in uppercase in
> > the running head. This is why it should work.
> >
> > The case of ligatures is different. My fonts have not only ff, fl, and
> > fi ligatures but even ffi and ffl. If I find a word "difficult" on a web
> > using a serif font, I see the ffi ligature but the source shows that it
> > has the individual characters f, f, i and the ligature was created by
> > the shaping engine. If I copy it and paste into a text editor such as
> > vim or notepad, I will get the three characters. If I use it as a TeX
> > source and typeset it withComputer Modern or Latin Modern, I will get
> > the ffi ligature and \uppercase will work. If I copy U+0587 from a web
> > page and copy it to a text editor, I will get U+0587. I tried both
> > U+0565 U+0582 (եւ) and U+0565 U+057E (եվ) but non of them form the
> > U+0587 (և) ligature in XeLaTeX. I did not understand why the ligature is
> > considered ECH and YIWN but it seems that it is more historical and
> > bound to the shape. If I understand it well, sun is pronounced in
> > Armenian as "arew" but արև (U+0561 U+0580 U+0587) is the Eastern
> > spelling but արեւ (U+0561 U+0580 U+0565 U+0582) is the classical
> > spelling (as given in Wictionary) and probably also in the Western
> > variant. As you can see on Omniglot, the Armenian names of
> > Eastern/Western Armenian start with "arew" with these two spellings.
> > Even "hayeren" (Armenian) has different spelling in the Eastern/Wes

Re: [XeTeX] Uppercase in Armenian

2022-05-01 Thread Peter von Kaehne
Generally speaking - and I am speaking here very much as a novice on LaTeX and 
without any knowledge of Armenian one nearly always does  better to decompose 
and normalise all UTF8 texts , then work on them and then only recompose. And I 
think here this approach would have worked out fine . 

Peter

Sent from my phone. Please forgive misspellings and weird “corrections”

> On 1 May 2022, at 19:13, Zdenek Wagner  wrote:
> 
> շնորհակալություն – thank you for confirmation. I believe that there
> are people who know how to fix it.
> 
> Zdeněk Wagner
> http://ttsm.icpf.cas.cz/team/wagner.shtml
> 
> ne 1. 5. 2022 v 19:06 odesílatel DALALYAN Arnak
>  napsal:
>> 
>> Dear All,
>> 
>> I confirm that there are two correct uppercase versions of և, the reformed 
>> spelling is ԵՎ, whereas the classical spelling is ԵՒ.  Note that it has 
>> nothing to do with eastern or western Armenians, both versions of Armenian 
>> may use both versions of spelling. But the official language in Armenia is 
>> the eastern Armenian and the official spelling is the reformed one. 
>> Therefore, I believe a good way of operating for the uppercase command would 
>> be to output ԵՎ in the default regime, but to have an option "classical" for 
>> outputting  ԵՒ if
>> that option is activated.
>> 
>> Just to dive a bit deeper in this topic, it is true that և was originally a 
>> ligature but now it is a full letter in the reformed spelling.
>> 
>> The aim of my response was to confirm what was already more or less 
>> mentioned in Zdanek's messages. But I fear I can't help with fixing what 
>> latex is doing now.
>> 
>> Best regards,
>> Arnak
>> 
>> From: Jonathan Kew [jfkth...@gmail.com]
>> Sent: Sunday, May 1, 2022 2:10 PM
>> To: XeTeX (Unicode-based TeX) discussion.; Zdenek Wagner
>> Cc: serguei.dach...@math.univ-bpclermont.fr; DALALYAN Arnak; 
>> vakop...@yahoo.com
>> Subject: Re: [XeTeX] Uppercase in Armenian
>> 
>> Hi Zdeněk,
>> 
>> Checking the Unicode character database[1], U+0587 is listed as having a
>> *compatibility* decomposition to <0565,0582> (not 0587):
>> 
>> 0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L; 0565 0582N;
>> 
>> Likewise, the SpecialCasing.txt file[2] that defines case mappings other
>> than simple 1:1 substitutions shows the same decomposition for the
>> uppercase form:
>> 
>> 0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN
>> 
>> So if I understand correctly, what \text_uppercase:n is doing is simply
>> implementing what the Unicode standard defines.
>> 
>> If this isn't the appropriate behavior, at least for some locales, I
>> believe that will need custom programming at some level, but I don't
>> know enough about it to get into any details.
>> 
>> As for whether xelatex (or other engines) form a ligature from one (or
>> other) of the decomposed sequences, that would be entirely in the hands
>> of the font developer. I guess such ligatures are not implemented widely
>> (if at all).
>> 
>> JK
>> 
>> [1] https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
>> [2] https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt
>> 
>>> On 01/05/2022 12:50, Zdenek Wagner wrote:
>>> Hi David,
>>> 
>>> when trying to explain it in a greater detail I found that the situation
>>> is even more complex. As I wrote, I follow Elena Yerevan on youtube and
>>> facebook so all what I know, I know from her videos, from her name
>>> written in both alphabets, from Wikipedia and from
>>> https://omniglot.com/writing/armenian.htm
>>>  which means that I know
>>> generally nothing. We need clarification from people who know Armenian
>>> (հայերէն and/or հայերեն), therefore I am sending Cc to Arthur and the
>>> authors of te ArmTeX project (hopefully at least one of the addresses
>>> still exists).
>>> 
>>> I will start with the typical use case. The title of a chapter in the
>>> book class is written in lowercase and displayed that way in the chapter
>>> title as well as in the table of contents but appears in uppercase in
>>> the running head. This is why it should work.
>>> 
>>> The case of ligatures is different. My fonts have not only ff, fl, and
>>> fi ligatures but even ffi and ffl. If I find a word "difficult" on a web
>>> using a serif font, I see the ffi ligature but the source shows that it
>>> has the individual characters f, f, i and the ligature was created by
>>> the shaping engine. If I copy it and paste into a text editor such as
>>> vim or notepad, I will get the three characters. If I use it as a TeX
>>> source and typeset it withComputer Modern or Latin Modern, I will get
>>> the ffi ligature and \uppercase will work. If I copy U+0587 from a web
>>> page and copy it to a text editor, I will get U+0587. I tried both
>>> U+0565 U+0582 (եւ) and U+0565 U+057E (եվ) but non of them form the
>>> U+0587 (և) ligature in XeLaTeX. I did not understand why the ligature is
>>> considered ECH and YIWN but 

Re: [XeTeX] Uppercase in Armenian

2022-05-01 Thread Joseph Wright

On 30/04/2022 23:52, David Carlisle wrote:

Something like this, I think.

[image: image.png]

\documentclass{article}
\usepackage{polyglossia}

\setdefaultlanguage{armenian}
\setmainfont{DejaVu Sans}
\ExplSyntaxOn
\let\tuppercase\text_uppercase:n
\ExplSyntaxOff
\pagestyle{empty}
\begin{document}
Երևան $\rightarrow$ \uppercase{Երևան}

Երևան $\rightarrow$ \tuppercase{Երևան}

\end{document}

David


The next expl3 release will include hy-x-yiwn as a language settings, 
allowing


   \newcommand\tuppercsae{\text_uppercase:n{hy-x-yiwn}}

in David's example - this variant will use the alternative mapping.

Joseph


Re: [XeTeX] Uppercase in Armenian

2022-05-01 Thread DALALYAN Arnak
Dear All,

I confirm that there are two correct uppercase versions of և, the reformed 
spelling is ԵՎ, whereas the classical spelling is ԵՒ.  Note that it has nothing 
to do with eastern or western Armenians, both versions of Armenian may use both 
versions of spelling. But the official language in Armenia is the eastern 
Armenian and the official spelling is the reformed one. Therefore, I believe a 
good way of operating for the uppercase command would be to output ԵՎ in the 
default regime, but to have an option "classical" for outputting  ԵՒ if
that option is activated.  

Just to dive a bit deeper in this topic, it is true that և was originally a 
ligature but now it is a full letter in the reformed spelling. 

The aim of my response was to confirm what was already more or less mentioned 
in Zdanek's messages. But I fear I can't help with fixing what latex is doing 
now. 

Best regards,
Arnak

From: Jonathan Kew [jfkth...@gmail.com]
Sent: Sunday, May 1, 2022 2:10 PM
To: XeTeX (Unicode-based TeX) discussion.; Zdenek Wagner
Cc: serguei.dach...@math.univ-bpclermont.fr; DALALYAN Arnak; vakop...@yahoo.com
Subject: Re: [XeTeX] Uppercase in Armenian

Hi Zdeněk,

Checking the Unicode character database[1], U+0587 is listed as having a
*compatibility* decomposition to <0565,0582> (not 0587):

0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L; 0565 0582N;

Likewise, the SpecialCasing.txt file[2] that defines case mappings other
than simple 1:1 substitutions shows the same decomposition for the
uppercase form:

0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN

So if I understand correctly, what \text_uppercase:n is doing is simply
implementing what the Unicode standard defines.

If this isn't the appropriate behavior, at least for some locales, I
believe that will need custom programming at some level, but I don't
know enough about it to get into any details.

As for whether xelatex (or other engines) form a ligature from one (or
other) of the decomposed sequences, that would be entirely in the hands
of the font developer. I guess such ligatures are not implemented widely
(if at all).

JK

[1] https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
[2] https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt

On 01/05/2022 12:50, Zdenek Wagner wrote:
> Hi David,
>
> when trying to explain it in a greater detail I found that the situation
> is even more complex. As I wrote, I follow Elena Yerevan on youtube and
> facebook so all what I know, I know from her videos, from her name
> written in both alphabets, from Wikipedia and from
> https://omniglot.com/writing/armenian.htm
>  which means that I know
> generally nothing. We need clarification from people who know Armenian
> (հայերէն and/or հայերեն), therefore I am sending Cc to Arthur and the
> authors of te ArmTeX project (hopefully at least one of the addresses
> still exists).
>
> I will start with the typical use case. The title of a chapter in the
> book class is written in lowercase and displayed that way in the chapter
> title as well as in the table of contents but appears in uppercase in
> the running head. This is why it should work.
>
> The case of ligatures is different. My fonts have not only ff, fl, and
> fi ligatures but even ffi and ffl. If I find a word "difficult" on a web
> using a serif font, I see the ffi ligature but the source shows that it
> has the individual characters f, f, i and the ligature was created by
> the shaping engine. If I copy it and paste into a text editor such as
> vim or notepad, I will get the three characters. If I use it as a TeX
> source and typeset it withComputer Modern or Latin Modern, I will get
> the ffi ligature and \uppercase will work. If I copy U+0587 from a web
> page and copy it to a text editor, I will get U+0587. I tried both
> U+0565 U+0582 (եւ) and U+0565 U+057E (եվ) but non of them form the
> U+0587 (և) ligature in XeLaTeX. I did not understand why the ligature is
> considered ECH and YIWN but it seems that it is more historical and
> bound to the shape. If I understand it well, sun is pronounced in
> Armenian as "arew" but արև (U+0561 U+0580 U+0587) is the Eastern
> spelling but արեւ (U+0561 U+0580 U+0565 U+0582) is the classical
> spelling (as given in Wictionary) and probably also in the Western
> variant. As you can see on Omniglot, the Armenian names of
> Eastern/Western Armenian start with "arew" with these two spellings.
> Even "hayeren" (Armenian) has different spelling in the Eastern/Western
> variants (I have included both at the beginning of this mail). Having
> found the informatin on variants I saw that polyglossia supports
> variant=western. I tried to specify variant=eastern but it did not help.
> If you look at ot6enc.def, it defines uppercase variants at the end of
> the file where the uppercase version of \armew is \Arm@yechvev which is
> \Armyech\Armvev. I cannot try