On Thu, 13 Nov 2003, Leopold Toetsch wrote:

> Dan Sugalski <[EMAIL PROTECTED]> wrote:
>
> > You're going to run into problems no matter what you do, and as
> > transcoding could happen with each comparison arguably you need to make a
> > local copy of the string for each comparison, as otherwise you run the
> > risk of significant data loss as a sring gets transcoded back and forth
> > across a lossy boundary.
>
> Here is again, what I already had proposed:
>  * as long as there are only ascii keys: noop
>  * on first non ascii key, convert all hash to utf8 - doesn't change
>    hash values

Well... this is the place where things fall down. It does change hash
values. You may find yourself transcoding from, say, Shift-JIS to Unicode,
which will result in most (if not all) of the characters in the string
changing code-points. That's likely to change hash values just a little...

> > Regardless, I think at least a single string copy with comparison against
> > that copy within the hash functions is the only way to get correct
> > results.
>
> Yes. That's the point - a single string copy. Now each compare could do
> a transcode i.e. generate a new string.

When I said "at least a single copy" I meant that we might have multiple
copies made, though that will definitely do nasty things to performance.

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to