On May 5, 2014, at 2:06 PM, Jens Alfke <j...@mooseyard.com> wrote: > How can I map a byte offset in a UTF-8 string back to the corresponding > character offset in the NSString it came from? > > I’m writing an Objective-C wrapper around a C text-tokenizer API that takes a > UTF-8 string as input, and as part of its output returns byte ranges of words > that it found. So my API takes an NSString, converts it to UTF-8, passes that > to the C API, and then gets these byte offsets that it needs to convert into > character offsets in the NSString. > > I’ve looked through both the NSString and CFString APIs and didn’t see > anything relevant to this. I know UTF-8 isn’t rocket science and I could > pretty easily write my own function to scan through it counting characters, > but I suspect I’d run into the differences between Unicode characters and the > UTF-16 code points that NSString actually considers “characters”. I’d much > rather let CF do this for me in an internally-consistent way.
I ran into this same problem once, and I don't think there's any way to do it other than scanning through the string. The good news is that the documentation for CFStringGetLength does specifically say that the length returned is in terms of UTF-16 code pairs, so I don't think they can change that implementation detail without breaking the contract. Charles _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com