At 12:22 AM +0200 4/22/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:

 Just to make sure... we're making sure the strings are always
 properly decomposed before comparing, right?

Not in the absence of any rules how to decompose or better when ;) We are currently still at Larry's level 0 or 1. Hash values and compare operations are stable though, for and up to Unicode codepoints.

Well, then, I'll make A Big Decision:


All strings, in the absence of explicit overriding of behavior, shall be treated as if they were in Canonical Form. If this is not the case, the strings will be canonicalized first. If a character set has both composed and decomposed versions of some characters, the decomposed version is our canonical form. This includes all hash keys, which means method names, global, and lexical variables are all treated as if their names were stored in decomposed form if there are decomposable characters in the names.

I can think of a language or two where this might be considered sub-optimal, so I'm willing to work this out, though I'm not sure I want to mix composed and decomposed characters depending on which ones they are.
--
Dan


--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to