When CFXMLCreateStringByUnescapingEntities is passed the string "&#13207494”, 
it returns a string of two unassigned Unicode characters which cause an NSLog 
containing it to not print, and also upsets Core Data.

// Define the problematic string
NSString* bomb1 = @"&#13207494" ;
NSLog(@"bomb1 length=%ld", (long)[bomb1 length]) ;
NSLog(@"bomb1 = '%@'", bomb1) ;

// Run it thru CFXMLCreateStringByUnescapingEntities()
NSString* bomb2 = (NSString*)CFXMLCreateStringByUnescapingEntities(NULL, 
(CFStringRef)bomb1, NULL) ;

// Examine the result
NSLog(@"bomb2 length=%ld", (long)[bomb2 length]) ;
unichar char0 = [bomb2 characterAtIndex:0] ;
NSLog(@"char0 = '%c' = %x = %d", char0, char0, char0) ;
unichar char1 = [bomb2 characterAtIndex:1] ;
NSLog(@"char1 = '%c' = %x = %d", char1, char1, char1) ;
NSLog(@"bomb2 = '%@' THIS DOES NOT LOG AT ALL!!!", bomb2) ;
printf("printf bomb2: %s\n", [bomb2 UTF8String]) ;

Here is the result:

TestApp[13859:303] bomb1 length=10
TestApp[13859:303] bomb1 = '&#13207494'
TestApp[13859:303] bomb2 length=2
TestApp[13859:303] char0 = 'É' = dcc9 = 56521
TestApp[13859:303] char1 = '-' = df2d = 57133
printf bomb2: (null)

I don’t see why CFXMLCreateStringByUnescapingEntities() is even touching bomb1, 
because it does not end in a semicolon.  There is no HTML entity in bomb1.

The two characters in bomb2, U+DCC9 and U+DF2D, are unassigned characters in 
the “Low Surrogates” block.  Changing the number “13207494” to a slightly 
different value sometimes cures the problem.

The Core Data upset occurs in -[NSManagedObjectContext save:], wherein an 
object has bomb2 as the value of a String attribute.  (Of course, that’s how I 
“discovered” this problem; a user managed to get a string containing bomb1 in 
their input xml data.)  The returned error is:

Error Code: 1671
Error Domain: NSCocoaErrorDomain, 
Localized Description: The operation couldn’t be completed. (Cocoa error 1671.)
NSValidationErrorKey: location  (This is the name of the String attribute.)
NSValidationErrorValue: 

Actually, the error viewer in my app, an NSTextView, displays the 
NSValidationErrorValue as a string of two identical characters that look like a 
square containing an upper-case letter A whose left half is blacked out.  But 
when I try to copy and paste that into any other app, including Mail.app, I get 
0 characters.

The “validation” error is *not* because the value of the ‘location’ attribute 
is nil.  The ‘location’ attribute is optional in the data model, and often is 
nil in working data sets.

This seems to me like a bug in CFXMLCreateStringByUnescapingEntities(), and 
that the proper workaround would be to pre-flight its input value (bomb1) and 
take evasive action if necessary.  But since the problem occurs with numbers 
other than 13207494, I need to know the bounds of the set of “bad” substrings.  
Alternatively, a not-so-good workaround would be to detect “invalid” strings in 
the output.  I’m hoping that someone smarter than me will know an answer that 
does not involve brute force :|

Thanks,

Jerry Krinock


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to