On Tue, Jul 17, 2018 at 4:22 AM, Richard Damon <rich...@damon-family.org> wrote: > > But I am not talking about those sort of characters or ligatures, but > ‘characters’ that are built up of a combining diacritical marks (like > accents) and a base character. Unicode define many code points for the more > common of these, but many others do not. >
So, you're talking about "grapheme clusters". Those can be arbitrarily large and complex. Trolls revel in the ability to adorn base characters with ridiculous numbers of "dripping" marks, making it hard to type their names. Since the amount of information in one grapheme cluster is (as far as I know) potentially infinite, it's fundamentally impossible to create a fixed-size encoding that can represent them. If I'm wrong about the possibilities being infinite, then they are certainly very extensive, as there are MANY combining characters available (the only question is whether you can use the same characters multiple times, in which case there are infinite possibilities, or if not, in which case the possibilities are base_characters*2^combining_characters aka "virtually infinite"). http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries This is a display feature, not an input feature, and certainly not a string representation feature. ChrisA -- https://mail.python.org/mailman/listinfo/python-list