Re: Constant strings - again

Dan Sugalski Wed, 21 Apr 2004 14:16:58 -0700

At 11:04 PM +0300 4/21/04, Jarkko Hietaniemi wrote:

>

 We need to address that, then. If we're doing
 unicode, we damn well need to do it right--å is
 å, regardless of whether it's composed or
 decomposed.


Agreed -- on some level.  But If we want to implement Larry's
:u0 (bytes) and :u1 (code points) levels we need to have also
the "more raw" comparisons available, somehow.  (I do not remember
whether Larry specified would :u2 do by default some of the Unicode
normalizations, thus doing (de)compositions.)

We'll work that out when the perl 6 compiler gets to that point. For Parrot, my preference (unless ICU makes it infeasable, which I doubt) is to keep everything decomposed. I hear rumor that way's preferred... :)

> If people want low-level binary comparisons (and

 generally we *shouldn't* for  most things) then
 they'll need to force the string to binary.


And I'm not certain whether "forcing to binary" is the right
visual image or approach here.  Maybe we need some sort of
"pragma" support so that we can tweak the ":u level"?  The
default level could well be :u2, the highest we can do without
picking some "language" rules.

I've got a Cunning Plan, oddly enough, though the margins of this e-mail are too small to contain it. As soon as I get it finished I'm going to pass it onto the list and to a few non-list folks who I know are deep into this stuff (Autrijus and Dan Kogai, if I can get in touch. I *really* wish I had someone who did mainly Korean text processing handy...) and we'll see where we go from there. I have no doubt it'll be... fun. Yeah, that's the word, fun! -- Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Re: Constant strings - again

Reply via email to