Mikhail V <mikhail...@gmail.com>: > On Sat, 15 Jul 2017 05:50 pm, Marko Rauhamaa wrote: >> Random access to code points is as uninteresting as random access to >> UTF-8 bytes. I might want random access to the "Grapheme clusters, >> a.k.a.real characters". > > What _real_ characters are you referring to? > If your data has "á" (U00E1), then it is one real character, > if you have "a" (U0061) and "ˊ" (U02CA) then it is _two_ > real characters. So in both cases you have access to code points = > real characters.
It's true that confusion is caused by the ambiguity of the term "character." > For metaphysical discussion - in _my_ definition there is no such > "real" character as "á", since it is the "a" glyph with some dirt, so > according to my definition, it should be two separate characters, both > semantically and technically seen. Here's the problem: when the human user types in "á" (with one, two or three keyclicks), they don't know how the computer represents it internally. The Unicode standard allows for two *equivalent* code point sequences (<URL: https://en.wikipedia.org/wiki/Unicode_equivalence>). When the computer outputs the sequence, the visible result is the single letter "á". The human user doesn't know—or care—about the internal representation. The user's expectation is that the visible letter "á" should behave like any other single letter. For example, a text editor should move the cursor past it with a single click of a left or right arrow key. Also, if I perform a regular-expression search in the editor and look for Alv[aá]rez I should get a match with either Alvarez or Alvárez. > And, in my definition, the whole Unicode is a huge junkyard, to start > with. I don't think anybody denies that. However, it's the best thing available and—more importantly—a universally accepted standard. > But opinions may vary, and in case you prefer or forced to write "á", > then it can be impractical to store it as two characters, regardless > of encoding. Now I'm not following you. Marko -- https://mail.python.org/mailman/listinfo/python-list