On Mon, Dec 18, 2000 at 03:21:05PM +0000, Nick Ing-Simmons wrote:
> Simon Cozens <[EMAIL PROTECTED]> writes:
> >
> >So, before we start even thinking about what we need, it's time to look at the
> >vexed question of string representation. How do we do Unicode without getting
> >into the horrendous non-Latin1 cockups we're seeing on p5p right now?
>
> Well - my theorist's answer is that everything is Unicode - like Java.
That would be nice, yes.
> As I pointed out on p5p even EBCDIC machines can use that model - but
> the downside is that ord('A') == 65 which will breaks backward compatibility
> with EBCDIC scripts.
Maybe we need $ENV{PERL_ENCODING} to control ord() and chr(), too?
> Tagging a string with a repertoire and encoding is horrible - you are aware
Indeed. We have had a very rough ride trying to get just two
encodings to play well together, trying to support more simultaneously
would be pure combinatorial masochism. I say we should strive for
converting everything to/from one agreed-upon internal encoding. Yes,
this is somewhat counter to the idea 'no preferred internal encoding'.
After pondering about the issue I have come around to "Oh, yes, there
should be one preferred internal encoding.", otherwise we banish
ourselves to much nashing of the teeth. Off-hand, I think it's only
when there would be information loss when the One True Encoding
conversion shouldn't be done. What's the OTE, then? Well, UTF-16 or
UTF-32, I guess. The redeeming features of UTF-8, that it is 1:1 for
ASCII, and also compact for ASCII, frankly are getting rather thing in
my eyes.
> The only sane compromise I can imagine is close to what we have at the
> moment with maybe a few extra special cases in the "flags" bits:
> ASCII only (0..7f)
> Native-single-byte (iso8859-x, IBM1047)
> wchar_t
> UTF-8
> UNICODE
Maybe.
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen