At 05:54 PM 9/7/2001 -0400, Bryan C. Warnock wrote:
>On Friday 07 September 2001 05:51 pm, Dan Sugalski wrote:
> > >(Like
> > >Unicode Everywhere).
> >
> > Who's doing that? We're keeping things in native format as much as we can.
>
>If one of our stated goals is Unicode support (even for the source itself -
>that's what I meant by "everywhere": source, input, output), we're going to
>be a little more hindered than if we didn't have to worry about it at all,
>no?
No. We don't want Unicode everywhere because:
*) Conversion to Unicode is sometimes lossy
*) Conversion back out of Unicode is sometimes lossy
*) Converting when we know how to work on the underlying string data is
wasted cycles
*) Lots of folks using non-7-bit ASCII have perfectly adequate character
sets with defined operations, so why should they have to use Unicode if
they don't need it?
Unicode's sort of a greatest-common-multiple character set. We'll use it if
we need to, but it's no panacea. (Unfortunately)
>Or will you only compare Granny Smiths with Granny Smiths?
If you compare, say, a Shift-JIS string to a Big5/traditional string,
they'll probably both end up both converting to Unicode and the result
compared. (Assuming that neither the Big5/traditional nor the Shift-JIS
string library knows how to convert to the other losslessly) And a plain
string comparison for gt/lt is less straightforward than you might think...
Dan
--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
[EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk