On 25 Mar 2014, at 11:12 AM, Jens Alfke wrote:
> I agree — it seems like the 32-bit equivalent of the more common mistake of 
> accepting an input blob containing text without first checking that it’s 
> valid UTF-8. I did that once, and after debugging the resulting file 
> corruption bug I made this sign to stick on my monitor: 
> http://mooseyard.com/Pictures/UntrustedUTF8.png
> 
> Now, what method/function should we use to validate that an NSString actually 
> contains valid Unicode code points?

We have this problem in a slightly different context (copy&paste and 
applescript can both sneak invalid strings into an app). We ended up simply 
looping through the string's UTF-16 content by hand and checking for bad 
surrogate pairs (which is what Jerry Krinock's U+DCC9 U+DF2D sequence is) as 
well as a handful of codepoints reserved as permanently invalid in Unicode 
(U+FFFE, U+FFFF, U+1FFFF, etc.) or XML (U+0000, etc.).

You're welcome to pluck OFStringContainsInvalidSequences() / 
OFStringRangeOfNextInvalidCodepoint() from OmniFoundation 
(CFString-OFExtensions), if you like. (I'm not sure if 
OFStringRangeOfNextInvalidCodepoint() has made it to our published repository 
yet.)

However, it's clearly a bug that CFXMLCreateStringByUnescapingEntities() can 
return an invalid string. 



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to