In article <4f375347$0$29986$c3e8da3$54964...@news.astraweb.com>, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote:
> ASCII truly is a blight on the world, and the sooner it fades into > obscurity, like EBCDIC, the better. That's a fair statement, but it's also fair to say that at the time it came out (49 years ago!) it was a revolutionary improvement on the extant state of affairs (every manufacturer inventing their own code, and often different codes for different machines). Given the cost of both computer memory and CPU cycles at the time, sticking to a 7-bit code (the 8th bit was for parity) was a necessary evil. As Steven D'Aprano pointed out, it was missing some commonly used US symbols such as ¢ or ©. This was a small price to pay for the simplicity ASCII afforded. It wasn't a bad encoding. I was a very good encoding. But the world has moved on and computing hardware has become cheap enough that supporting richer encodings and character sets is realistic. And, before people complain about the character set being US-Centric, keep in mind that the A in ASCII stands for American, and it was published by ANSI (whose A also stands for American). I'm not trying to wave the flag here, just pointing out that it was never intended to be anything other than a national character set. Part of the complexity of Unicode is that when people switch from working with ASCII to working with Unicode, they're really having to master two distinct things at the same time (and often conflate them into a single confusing mess). One is the Unicode character set. The other is a specific encoding (UTF-8, UTF-16, etc). Not to mention silly things like BOM (Byte Order Mark). I expect that some day, storage costs will become so cheap that we'll all just be using UTF-32, and programmers of the day will wonder how their poor parents and grandparents ever managed in a world where nobody quite knew what you meant when you asked, "how long is that string?".
-- http://mail.python.org/mailman/listinfo/python-list