Re: Python usage numbers

Dave Angel Sun, 12 Feb 2012 14:44:23 -0800

On 02/12/2012 05:27 PM, Roy Smith wrote:

In article<[email protected]>,
  Chris Angelico<[email protected]>  wrote:

On Mon, Feb 13, 2012 at 9:07 AM, Terry Reedy<[email protected]>  wrote:

The situation before ascii is like where we ended up *before* unicode.
Unicode aims to replace all those byte encoding and character sets with
*one* byte encoding for *one* character set, which will be a great
simplification. It is the idea of ascii applied on a global rather that
local basis.

Unicode doesn't deal with byte encodings; UTF-8 is an encoding, but so
are UTF-16, UTF-32. and as many more as you could hope for. But
broadly yes, Unicode IS the solution.

I could hope for one and only one, but I know I'm just going to be
disapointed.  The last project I worked on used UTF-8 in most places,
but also used some C and Java libraries which were only available for
UTF-16.  So it was transcoding hell all over the place.

Hopefully, we will eventually reach the point where storage is so cheap
that nobody minds how inefficient UTF-32 is and we all just start using
that.  Life will be a lot simpler then.  No more transcoding, a string
will just as many bytes as it is characters, and everybody will be happy
again.

Keep your in-memory character strings as Unicode, and onlyserialize(encode) them when they go to/from a device, or to/fromanachronistic code. Then the cost is realized at the point of theproblem. No different than when deciding how to serialize any otherdata type. Do it only at the point of entry/exit of your program.

But as long as devices are addressed as bytes, or as anything smallerthan 32bit thingies, you will have encoding issues when writing to thedevice, and decoding issues when reading. At the very least, you havebig-endian/little-endian ways to encode that UCS-4 code point.









--
http://mail.python.org/mailman/listinfo/python-list

Re: Python usage numbers

Reply via email to