On Sunday, July 16, 2017 at 4:09:16 AM UTC+5:30, Mikhail V wrote: > On Sat, 15 Jul 2017 05:50 pm, Marko Rauhamaa wrote: > > Random access to code points is as uninteresting as random access to > > UTF-8 bytes. > > I might want random access to the "Grapheme clusters, a.k.a.real > > characters". > > What _real_ characters are you referring to? > If your data has "á" (U00E1), then it is one real character, > if you have "a" (U0061) and "ˊ" (U02CA) then it is _two_ > real characters. So in both cases you have access to code points = > real characters.
Right now in an adjacent mailing list (debian) I see someone signed off with a grüß I guess the third character is a u with some ‘dirt’ Whats the fourth? > > For metaphysical discussion - in _my_ definition there s/metaphysical/linguistic > is no such "real" character as "á", since it is the "a" glyph with some dirt, > so according to my definition, it should be two separate characters, > both semantically and technically seen. > > And, in my definition, the whole Unicode is a huge junkyard, to start with. > > But opinions may vary, and in case you prefer or forced to write "á", > then it can be impractical to store it as two characters, regardless of > encoding. Heck even in the English that I learnt in school we had ægis, homœopath etc And just now looking up: https://en.wikipedia.org/wiki/List_of_words_that_may_be_spelled_with_a_ligature I see economics is œconomics!! Seriously the word "ligature" like the word "grapheme" is misleading Its not a graphical or typographic notion its an atom of the language's lexicon No Hindi speaker seeing क + ई = की calls the last anything but a letter And the vowel sign ी is never first class a vowel -- https://mail.python.org/mailman/listinfo/python-list