On Wed, Nov 12, 2003 at 09:18:24PM +0000, Nicholas Clark wrote: > On Wed, Nov 12, 2003 at 01:57:14PM -0500, Dan Sugalski wrote: > > > You're going to run into problems no matter what you do, and as > > transcoding could happen with each comparison arguably you need to make a > > local copy of the string for each comparison, as otherwise you run the > > risk of significant data loss as a sring gets transcoded back and forth > > across a lossy boundary. > > I think that this rules out what I was going to ask/suggested, having read > Leo's patch. I was wondering why there wasn't a straight memcmp of the > two strings whenever their encoding were the same. I presume that there > are some encodings where two different binary representations are considered > "equal", hence we can't blindly assume that a byte compare is sufficient. yep, AFAIK there are at least two different ways to express the german umlaut ä (i can see it on my keyboard) in unicode. i think simon cozins has a good paper (somewhere) about that.
re, tc