In article <mailman.5739.1329084873.27778.python-l...@python.org>, Chris Angelico <ros...@gmail.com> wrote:
> On Mon, Feb 13, 2012 at 9:07 AM, Terry Reedy <tjre...@udel.edu> wrote: > > The situation before ascii is like where we ended up *before* unicode. > > Unicode aims to replace all those byte encoding and character sets with > > *one* byte encoding for *one* character set, which will be a great > > simplification. It is the idea of ascii applied on a global rather that > > local basis. > > Unicode doesn't deal with byte encodings; UTF-8 is an encoding, but so > are UTF-16, UTF-32. and as many more as you could hope for. But > broadly yes, Unicode IS the solution. I could hope for one and only one, but I know I'm just going to be disapointed. The last project I worked on used UTF-8 in most places, but also used some C and Java libraries which were only available for UTF-16. So it was transcoding hell all over the place. Hopefully, we will eventually reach the point where storage is so cheap that nobody minds how inefficient UTF-32 is and we all just start using that. Life will be a lot simpler then. No more transcoding, a string will just as many bytes as it is characters, and everybody will be happy again. -- http://mail.python.org/mailman/listinfo/python-list