On Apr 21, 2004, at 10:20 AM, Dan Sugalski wrote:
At 9:22 AM -0700 4/21/04, Jeff Clites wrote:On Apr 21, 2004, at 4:05 AM, Leopold Toetsch wrote:
... a factor ~14 performance increase for the "not equal" case.
Ah, great! (And the "not equal" case is the only one which should be showing a speed up--the "same" and "equal" cases are expected to be unaffected.)
Just to make sure... we're making sure the strings are always properly decomposed before comparing, right?
Nope, this is a literal "equal" comparison--you'd build a normalized compare on top of this. (There's 2 reasons for that: (1) You definitely need a non-normalized comparison available, because often that's what you want, and (2) For normalized comparison, you need to pick which style of normalization you want--there are at least 4 choices, each of which makes sense in different situations.)
We need to address that, then. If we're doing unicode, we damn well need to do it right--å is å, regardless of whether it's composed or decomposed.
If people want low-level binary comparisons (and generally we *shouldn't* for most things) then they'll need to force the string to binary.
--
Dan
--------------------------------------"it's like this"------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk