This is neither a complaint nor a question, just a comment. In the previous discussion related to the flexible string representation, Roy Smith added this comment:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/2645504f459bab50/eda342573381ff42 Not only I agree with his sentence: "Clearly, the world has moved to a 32-bit character set." he used in his comment a very intersting word: "punctuation". There is a point which is, in my mind, not very well understood, "digested", underestimated or neglected by many developers: the relation between the coding of the characters and the typography. Unicode (the consortium), does not only deal with the coding of the characters, it also worked on the characters *classification*. A deliberatly simplistic representation: "letters" in the bottom of the table, lower code points/integers; "typographic characters" like punctuation, common symbols, ... high in the table, high code points/integers. The conclusion is inescapable, if one wish to work in a "unicode mode", one is forced to use the whole palette of the unicode code points, this is the *nature* of Unicode. Technically, believing that it possible to optimize only a subrange of the unicode code points range is simply an illusion. A lot of work, probably quite complicate, which finally solves nothing. Python, in my mind, fell in this trap. "Simple is better than complex." -> hard to maintained "Flat is better than nested." -> code points range "Special cases aren't special enough to break the rules." -> special unicode code points? "Although practicality beats purity." -> or the opposite? "In the face of ambiguity, refuse the temptation to guess." -> guessing a user will only work with the "optimmized" char subrange. ... Small illustration. Take an a4 page containing 50 lines of 80 ascii characters, add a single 'EM DASH' or an 'BULLET' (code points > 0x2000), and you will see all the optimization efforts destroyed. >> sys.getsizeof('a' * 80 * 50) 4025 >>> sys.getsizeof('a' * 80 * 50 + '•') 8040 Just my 2 € (code point 0x20ac) cents. jmf -- http://mail.python.org/mailman/listinfo/python-list