Re: Glyphs and graphemes [was Re: Cult-like behaviour]

Chris Angelico Mon, 16 Jul 2018 11:55:53 -0700

On Tue, Jul 17, 2018 at 4:22 AM, Richard Damon <rich...@damon-family.org> wrote:
>
> But I am not talking about those sort of characters or ligatures, but 
> ‘characters’ that are built up of a combining diacritical marks (like 
> accents) and a base character. Unicode define many code points for the more 
> common of these, but many others do not.
>


So, you're talking about "grapheme clusters". Those can be arbitrarily
large and complex. Trolls revel in the ability to adorn base
characters with ridiculous numbers of "dripping" marks, making it hard
to type their names. Since the amount of information in one grapheme
cluster is (as far as I know) potentially infinite, it's fundamentally
impossible to create a fixed-size encoding that can represent them. If
I'm wrong about the possibilities being infinite, then they are
certainly very extensive, as there are MANY combining characters
available (the only question is whether you can use the same
characters multiple times, in which case there are infinite
possibilities, or if not, in which case the possibilities are
base_characters*2^combining_characters aka "virtually infinite").

http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

This is a display feature, not an input feature, and certainly not a
string representation feature.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Glyphs and graphemes [was Re: Cult-like behaviour]

Reply via email to