On 18 Jun 2008, at 07:25, Andrew Farmer wrote:

NSStrings are encoding-independent. They represent strings, not sequences of bytes.

Not *entirely*. The docs are a little sloppy on this, unfortunately, both for Cocoa and Core Foundation; in both cases they talk about "Unicode characters" and suggest that these may be 16-bits in size. There was a point in the past where Unicode (as opposed to ISO10646, which later merged with it if I've got my history right) was indeed a 16-bit per "character" encoding, which is probably the reason the docs read the way they do, but it isn't really true today and so it's best not to think of it that way.

Perhaps more accurately, NSString is a sequence of UTF-16 code units, which is not the same thing at all (in fact, the word "character" is generally one to avoid because it's often unclear what you mean when you use it).

In particular, -characterAtIndex: can return either half of a surrogate pair (e.g. if you have a string containing a non-BMP code point like MUSICAL SYMBOL G CLEF U+1D11E, which is encoded D834 DD1E according to Character Palette, you might get 0xD834 or 0xDD1E, but you won't ever get 0x1D11E). Nor is that the only trap for the unwary; you can also get various types of Unicode control codes as well as several kinds of combining characters (though the most common group is probably accents).

The String Programming Guide does warn about this to some extent:

"If you need to access string objects character-by-character, you must understand the Unicode character encoding—specifically, issues related to composed character sequences."

Anyway, this is often not a big deal, but in some applications it can be so it's worth bearing in mind.

Kind regards,

Alastair.

--
http://alastairs-place.net

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to