Re: CFXMLCreateStringByUnescapingEntities() bombs on "�"

Quincey Morris Tue, 25 Mar 2014 10:53:32 -0700

On Mar 25, 2014, at 10:04 , Jerry Krinock <je...@ieee.org> wrote:

> // Examine the result
> NSLog(@"bomb2 length=%ld", (long)[bomb2 length]) ;
> unichar char0 = [bomb2 characterAtIndex:0] ;
> NSLog(@"char0 = '%c' = %x = %d", char0, char0, char0) ;
> unichar char1 = [bomb2 characterAtIndex:1] ;
> NSLog(@"char1 = '%c' = %x = %d", char1, char1, char1) ;
> NSLog(@"bomb2 = '%@' THIS DOES NOT LOG AT ALL!!!", bomb2) ;
> printf("printf bomb2: %s\n", [bomb2 UTF8String]) ;
> 
> Here is the result:
> 
> TestApp[13859:303] bomb1 length=10
> TestApp[13859:303] bomb1 = '&#13207494'
> TestApp[13859:303] bomb2 length=2
> TestApp[13859:303] char0 = 'É' = dcc9 = 56521
> TestApp[13859:303] char1 = '-' = df2d = 57133
> printf bomb2: (null)
> 
> I don’t see why CFXMLCreateStringByUnescapingEntities() is even touching 
> bomb1, because it does not end in a semicolon.  There is no HTML entity in 
> bomb1.
> 
> The two characters in bomb2, U+DCC9 and U+DF2D, are unassigned characters in 
> the “Low Surrogates” block.  Changing the number “13207494” to a slightly 
> different value sometimes cures the problem.


You’ve got this slightly wrong. The 16-bit “characters” in a NSString aren’t 
Unicode characters (that is, code points). Rather, they’re UTF-16 code units. 
In some cases (specifically, with code units between D800 and DFFF), it takes 
two of these to represent one code point. Thus, in your example, it makes no 
sense to try to display the code units separately as characters.

> This seems to me like a bug in CFXMLCreateStringByUnescapingEntities(), and 
> that the proper workaround would be to pre-flight its input value (bomb1) and 
> take evasive action if necessary. 

I agree this is probably a bug in CFXMLCreateStringByUnescapingEntities. It 
seems to have assumed a missing ‘;’ at the end of an otherwise valid escaped 
character entity. It probably shouldn’t make this assumption.

However, I also see this as a bug in your code, since you’re accepting “random” 
user input as formatted text (i.e. escaped HTML) without validation. That sort 
of assumption makes you prone to exploding bugs like your Core Data crash. It’s 
similar to buffer overflow bugs, in that not only can it cause crashes but also 
it can compromise system security.

Not every 32-bit value is a valid Unicode code point. Therefore, I don’t think 
its a *workaround* to validate your input. Since you have provided a technique 
that can enter any 32-bit code point, it’s a necessary step.




_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: CFXMLCreateStringByUnescapingEntities() bombs on "�"

Reply via email to