Still working on this and still getting nowhere, so another question:

Is there a way to prevent NSXMLElement converting '&' into '&amp' so that I can 
resolve character entities myself in my own NSXMLElement category -init... 
method?

To recap the problem, the NSXML classes change '<' into '&lt;' and '&' into 
'&amp;' (when in string value content), just as they should according to the 
XML specs. But they don't convert '>' into '&gt;'. This is fine as the XML 
specs don't require this in most situations, but if '>' appears in the string 
']]>' (when not ending CDATA) then it must be escaped - but Apple's NSXML 
classes don't do this, generating invalid XML that cannot be opened by 
NSXMLDocument in this situation.

I tried creating my own -initWithName:validStringValue: method which did some 
jiggery-pokery and then called -initWithXMLString:, thinking that this wouldn't 
do any conversion, the idea being that I could force ']]>' to appear as 
']]&gt;' myself by creating the XML string directly rather than going through 
-setStringValue. But no. If you try this:

NSXMLElement *element = [[NSXMLElement alloc] 
initWithXMLString:@"<test>&gt;</test>"];
NSLog (@"%@", element);

The output is:

<test>></test>

In other words, the NSXMLElement automatically *forces* any occurrences of 
'&gt;' to become '>', no matter how you try to work around it. And this means 
that if the user has entered the string ']]>' and you need to encode that in 
XML somewhere, then the NSXML classes force you to write invalid XML that 
cannot be read.

I've also tried creating the element like this:
element = [[NSXMLNode alloc] initWithKind:NSXMLElementKind 
options:NSXMLPreserveAll];
[element setName:@"Test"];
[element setObjectValue:@"&gt;"];

But this comes out as:

<Text>&amp;gt;</Test>

Right now I'm thinking the only way around this is to nuke any occurrences of 
']]>' altogether, and just not allow this sequence to be written to file at 
all. It's unlikely the user will enter this string in the fields that get 
encoded to XML in my app, anyway, so it will probably never be an issue. But I 
can't count on that, and this isn't an ideal solution - I'd much rather just 
know that I can write valid XML by escaping necessary characters.

So, if anyone has any ideas of how to encode ']]>' as ']]&gt;' in the string 
value of an NSXMLElement (without it becoming ']]&amp;gt;"), I'd be very 
grateful.

I think I need to file a bug report on this, too.

Many thanks and all the best,
Keith



--- ORIGINAL MESSAGE ---

Just to follow up on this, yet again it seems that the NSXML classes are better 
at validating invalid XML when opening documents than when generating XML data. 
If you include the string "]]>" inside the stringValue of an NSXMLElement, the 
'>' does not get escaped as it should according to the XML specs, and when you 
generate XML document data including such an element and then try to read it 
again, NSXMLDocument will fail and report the error: "Sequence ']]>' not 
allowed in content". Some sample code to demonstrate the issue:

// Create an element containing some characters that should be escaped to 
create valid XML.
NSXMLElement*element = [[[NSXMLElementalloc] initWithName:@"Test"stringValue:@" 
< & > ]]> "] autorelease];
// Note how the '<' and '&' get escaped, but not the '>' (even though it should 
do in the ']]>' sequence).
NSLog(@"%@", element);// OUTPUT: <Test> &lt; &amp; > ]]> </Test>
// Now create an XML doc from the element and generate the data.
NSXMLDocument*xmlDoc = [[[NSXMLDocumentalloc] initWithRootElement:element] 
autorelease];
NSData*data = [xmlDoc XMLDataWithOptions:NSXMLNodePrettyPrint];
// Check the doc and data:
NSLog(@"XML Doc: %...@\ndata: %@", xmlDoc, data);// Yep, they are non-nil, all 
fine.
// Now load the data we created into an XML document.
NSError *error;
xmlDoc = [[NSXMLDocumentalloc] initWithData:data 
options:NSXMLNodePreserveWhitespaceerror:&error];
if(xmlDoc == nil)// If it failed, try with tidy.
xmlDoc = [[NSXMLDocumentalloc] initWithData:data 
options:NSXMLNodePreserveWhitespaceerror:&error];
// Did it fail?
if (xmlDoc == nil)
{
// Run the error.
if (error)
[[NSAlertalertWithError:error] runModal];
// Uh-oh... The error is: "Line 2: Sequence ']]>' not allowed in content". 
Because the '>' should have been escaped.
}

In other words, although the NSXML classes will escape '<' and '&' correctly, 
they will not handle escaping '>' at all - even when it occurs in the invalid 
(except when terminating CDATA) sequence ']]>'.  This then causes the NSXML 
classes to fail when re-loading the document they just created from the same 
data, because NSXML is more fussy about reading than writing.

One the one hand, it is (sort of) fair enough to expect the user of these 
classes to ensure the string values are valid XML (even if it does mean every 
user of these classes having to be extra careful and become very familiar with 
the XML specs); on the other hand, how do I go about ensuring valid XML when 
this is user-generated data over which I have no control, and when the NSXML 
classes will tidy up the ampersands in any character entities I try to escape 
myself?

At first, I thought I could just replace all occurrences of ">" with "&gt;" 
using NSString's -stringByReplacingOccurrencesOfString:withString:, e.g.:

NSString *validXMLStr = [userStr stringByReplacingOccurrencesOfString:@">" 
withString:@"&gt;"];
NSXMLElement *element = [[NSXMLElement alloc] initWithName:@"Text" 
stringValue:validXMLStr];

Then, to restore it:
NSString *value = [element stringValue];
userStr = [value stringByReplacingOccurrencesOfString:@"&gt;" withString:@">"];

But of course, that won't work, because the "&gt;" I place in my "fixed" string 
will become "&amp;gt;" in the XML file. So, consider the user had written a 
string all about XML himself:

"It turns out that ']]>' needs changing to ']]&gt;' for valid XML..."

I then swap out the '>' in this situation to '&gt;':

"It turns out that ']]&gt;' needs changing to ']]&gt;' for valid XML..."

I then pass it to the stringValue of an NSXMLElement which encodes it as:

"It turns out that ']]&amp;gt;' needs changing to ']]&amp;gt;' for valid XML..."

Then I read it back out, get its string value, and swap all occurrences of 
'&gt;' for '>', and what we get on re-opening the file is:

"It turns out that ']]>' needs changing to ']]>' for valid XML..."

i.e. Not what the user wrote. An unlikely situation, I know, but not impossible 
and I have to account for it.

In other words, if NSXMLElement won't escape the '>' for me in situations where 
it should, how do I do it myself?

Am I missing something obvious?

Many thanks and all the best,
Keith

----- Original Message ----
From: Jens Alfke <j...@mooseyard.com>
To: Keith Blount <keithblo...@yahoo.com>
Cc: glenn andreas <gandr...@mac.com>; "cocoa-dev@lists.apple.com" 
<cocoa-dev@lists.apple.com>
Sent: Tue, February 9, 2010 9:37:46 PM
Subject: Re: NSXML and >


On Feb 9, 2010, at 1:03 PM, Keith Blount wrote:

> Great, many thanks for the reply, and for the location of the information in 
> the XML docs, that's very helpful. Unfortunately, it seems that the NSXML 
> classes don't fix the '>' in the ']]>' case either, though:
> NSXMLElement*element = [[[NSXMLElementalloc] 
> initWithName:@"Test"stringValue:@"< & > ]]>"] autorelease];
> NSLog (@"%@", element);

The ">" in "]]>" only needs to be escaped when it's inside a CDATA, I believe. 
(Since that string marks the end of a CDATA.)

—Jens



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to