At 11:17 AM -0700 4/21/04, Jeff Clites wrote:
On Apr 21, 2004, at 10:20 AM, Dan Sugalski wrote:

At 9:22 AM -0700 4/21/04, Jeff Clites wrote:
On Apr 21, 2004, at 4:05 AM, Leopold Toetsch wrote:

... a factor ~14 performance increase for the "not equal" case.

Ah, great! (And the "not equal" case is the only one which should be showing a speed up--the "same" and "equal" cases are expected to be unaffected.)

Just to make sure... we're making sure the strings are always properly decomposed before comparing, right?

Nope, this is a literal "equal" comparison--you'd build a normalized compare on top of this. (There's 2 reasons for that: (1) You definitely need a non-normalized comparison available, because often that's what you want, and (2) For normalized comparison, you need to pick which style of normalization you want--there are at least 4 choices, each of which makes sense in different situations.)

We need to address that, then. If we're doing unicode, we damn well need to do it right--å is å, regardless of whether it's composed or decomposed.


If people want low-level binary comparisons (and generally we *shouldn't* for most things) then they'll need to force the string to binary.
--
Dan


--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to