On Thursday, 23 August 2012 18:17:29 UTC+5:30, (unknown) wrote: > This is neither a complaint nor a question, just a comment. > > > > In the previous discussion related to the flexible > > string representation, Roy Smith added this comment: > > > > http://groups.google.com/group/comp.lang.python/browse_thread/thread/2645504f459bab50/eda342573381ff42 > > > > Not only I agree with his sentence: > > "Clearly, the world has moved to a 32-bit character set." > > > > he used in his comment a very intersting word: "punctuation". > > > > There is a point which is, in my mind, not very well understood, > > "digested", underestimated or neglected by many developers: > > the relation between the coding of the characters and the typography. > > > > Unicode (the consortium), does not only deal with the coding of > > the characters, it also worked on the characters *classification*. > > > > A deliberatly simplistic representation: "letters" in the bottom > > of the table, lower code points/integers; "typographic characters" > > like punctuation, common symbols, ... high in the table, high code > > points/integers. > > > > The conclusion is inescapable, if one wish to work in a "unicode > > mode", one is forced to use the whole palette of the unicode > > code points, this is the *nature* of Unicode. > > > > Technically, believing that it possible to optimize only a subrange > > of the unicode code points range is simply an illusion. A lot of > > work, probably quite complicate, which finally solves nothing. > > > > Python, in my mind, fell in this trap. > > > > "Simple is better than complex." > > -> hard to maintained > > "Flat is better than nested." > > -> code points range > > "Special cases aren't special enough to break the rules." > > -> special unicode code points? > > "Although practicality beats purity." > > -> or the opposite? > > "In the face of ambiguity, refuse the temptation to guess." > > -> guessing a user will only work with the "optimmized" char subrange. > > ... > > > > Small illustration. Take an a4 page containing 50 lines of 80 ascii > > characters, add a single 'EM DASH' or an 'BULLET' (code points > 0x2000), > > and you will see all the optimization efforts destroyed. > > > > >> sys.getsizeof('a' * 80 * 50) > > 4025 > > >>> sys.getsizeof('a' * 80 * 50 + '•') > > 8040 > > > > Just my 2 € (code point 0x20ac) cents. > > > > jmf
The zen of python is simply a guideline -- http://mail.python.org/mailman/listinfo/python-list