On Thu, Nov 5, 2009 at 8:04 AM, Ryan Homer <hzc.li...@gmail.com> wrote: > Actually, > > That was a bad example since \u only allows up to 4 digits, so the string > was in fact a length of 3 characters, the character '5' being the 3rd. > However, the issue still seems to exist. > > I have the actual characters in a text file and an application that imports > the data. When the application imports the string with those two characters, > it returns a length of 3. I will paste the characters directly into the > string constant, though some people might not be able to see them. > > NSString *s = @"灵𤟥"; > NSLog(@"%@ (length=%d)",s,s.length); > > OR > > NSString *s = @"\u7075\xf0\xa4\x9f\xa5"; > NSLog(@"%@ (length=%d)",s,s.length); > > still returns a length of 3.
NSString uses UTF-16, so your U+247e5 character is represented by two surrogate characters. In general, you should never expect the length of a string code units, as a programmer would see it, to match the length of characters in a string as users would see it. You don't even have to involve characters outside of the basic multilingual plane for this to be an issue. Take, for example, the string "müssen" (i.e. the verb "must" in German). There are two ways of representing this string, one of which will have a length of 6, while the other has a length of 7. Is there any particular problem that this is causing in your code? -- Clark S. Cox III clarkc...@gmail.com _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com