Chris Angelico wrote:
* Strings with all codepoints < 256 are represented as they currently are (one byte per char). There are no combining characters in the first 256 codepoints anyway. * Strings with all codepoints < 65536 and no combining characters, ditto (two bytes per char). * Strings with any combining characters in them are stored in four bytes per char even if all codepoints are <65536. * Any time a character consists of a single base with no combining, it is stored in UTF-32. * Combined characters are stored in the primary array as 0x80000000 plus the index into a secondary array where these values are stored. * The secondary array has a pointer for each combined character (ignoring single-code-point characters), probably to a Python integer object for simplicity.
+1. We should totally do this just to troll the RUE! -- Greg -- https://mail.python.org/mailman/listinfo/python-list