On 25 Mar 2014, at 11:12 AM, Jens Alfke wrote: > I agree — it seems like the 32-bit equivalent of the more common mistake of > accepting an input blob containing text without first checking that it’s > valid UTF-8. I did that once, and after debugging the resulting file > corruption bug I made this sign to stick on my monitor: > http://mooseyard.com/Pictures/UntrustedUTF8.png > > Now, what method/function should we use to validate that an NSString actually > contains valid Unicode code points?
We have this problem in a slightly different context (copy&paste and applescript can both sneak invalid strings into an app). We ended up simply looping through the string's UTF-16 content by hand and checking for bad surrogate pairs (which is what Jerry Krinock's U+DCC9 U+DF2D sequence is) as well as a handful of codepoints reserved as permanently invalid in Unicode (U+FFFE, U+FFFF, U+1FFFF, etc.) or XML (U+0000, etc.). You're welcome to pluck OFStringContainsInvalidSequences() / OFStringRangeOfNextInvalidCodepoint() from OmniFoundation (CFString-OFExtensions), if you like. (I'm not sure if OFStringRangeOfNextInvalidCodepoint() has made it to our published repository yet.) However, it's clearly a bug that CFXMLCreateStringByUnescapingEntities() can return an invalid string. _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com