On Thu, 13 Nov 2003, Leopold Toetsch wrote: > Dan Sugalski <[EMAIL PROTECTED]> wrote: > > > You're going to run into problems no matter what you do, and as > > transcoding could happen with each comparison arguably you need to make a > > local copy of the string for each comparison, as otherwise you run the > > risk of significant data loss as a sring gets transcoded back and forth > > across a lossy boundary. > > Here is again, what I already had proposed: > * as long as there are only ascii keys: noop > * on first non ascii key, convert all hash to utf8 - doesn't change > hash values
Well... this is the place where things fall down. It does change hash values. You may find yourself transcoding from, say, Shift-JIS to Unicode, which will result in most (if not all) of the characters in the string changing code-points. That's likely to change hash values just a little... > > Regardless, I think at least a single string copy with comparison against > > that copy within the hash functions is the only way to get correct > > results. > > Yes. That's the point - a single string copy. Now each compare could do > a transcode i.e. generate a new string. When I said "at least a single copy" I meant that we might have multiple copies made, though that will definitely do nasty things to performance. Dan --------------------------------------"it's like this"------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk