In article <mailman.3531.1345416176.4697.python-l...@python.org>, Chris Angelico <ros...@gmail.com> wrote:
> Really, the only viable alternative to PEP 393 is a fixed 32-bit > representation - it's the only way that's guaranteed to provide > equivalent semantics. The new storage format is guaranteed to take no > more memory than that, and provide equivalent functionality. In the primordial days of computing, using 8 bits to store a character was a profligate waste of memory. What on earth did people need with TWO cases of the alphabet (not to mention all sorts of weird punctuation)? Eventually, memory became cheap enough that the convenience of using one character per byte (not to mention 8-bit bytes) outweighed the costs. And crazy things like sixbit and rad-50 got swept into the dustbin of history. So it may be with utf-8 someday. Clearly, the world has moved to a 32-bit character set. Not all parts of the world know that yet, or are willing to admit it, but that doesn't negate the fact that it's true. Equally clearly, the concept of one character per byte is a big win. The obvious conclusion is that eventually, when memory gets cheap enough, we'll all be doing utf-32 and all this transcoding nonsense will look as antiquated as rad-50 does today. -- http://mail.python.org/mailman/listinfo/python-list