On Aug 18, 2008, at 10:57 PM, Michael Ash wrote:
Note that depending on what kind of results you want, even if all of your data is within the BMP, this *still* won't save you. As a really basic example, consider a simple, obvious character like é. (That's an e with an acute accent on it if you're having unicode trouble in your e-mail client.) That can be represented as two separate unicode code points, a plain old ASCII e followed by a combining accent mark. If you should happen to split the string on the accent mark, such that the e goes into the first half and the combining accent mark goes into the second half, you get a really unintuitive result. What appears to the user to be a single character gets suddenly blown in two. Worse, if you happen to insert a string in the middle, you could end up applying that acute accent to some *other* letter instead.
Sorry, failed to mention that our UTF-16BE data was also normalized to pre-composed Unicode. So this case was handled.
You mentioned Korean (which I have yet to play around with), but for another grand 'ol time, try Arabic. You get into something called "positional variants". But alas, that's outside the scope of this list.
I think the moral of the story here is that when working with Unicode data, it's best to normalize such data and then ensure APIs operating on the data are Unicode savvy.
Thankfully, as you've pointed out, the NSString etc. APIs shield folks from much of the gory details.
___________________________________________________________ Ricky A. Sharp mailto:[EMAIL PROTECTED] Instant Interactive(tm) http://www.instantinteractive.com _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]