I'm happy to take the code below as a patch. I think that would be the best thing to do in this case, because I don't really have anything else to add.
Jay On Thu, 5 Jan 2012 18:07:47 +0000, Norman Gray <nor...@astro.gla.ac.uk> mumbled: > Jay, hello. > On 4 Jan 2012, at 20:53, Jay McCarthy wrote: > >> In the XML module's cdata struct, "[t]he string field is assumed to > >> be of the form <![CDATA[‹content›]]> with proper quoting of > >> ‹content›." It's not clear that this is a very useful design of the > >> interface. > > > >> Principally, it makes it inconvenient to get at the <content>, and > >> requires calls to substring (or something like that) in order to > >> extract the <content> from cdata-string. > > > > I'm happy including a helper function that does the substring. > An alternative would be to define, say, the following. > #lang racket > (struct source (start stop)) ; dummy definition > (struct cdata source (chars) > #:guard (λ (start stop chars type) > (cond ((regexp-match #rx"^<!\\[CDATA\\[(.*)]]>$" chars) > => (λ (m) > (values start stop (list-ref m 1)))) > (else (values start stop chars))))) > (define (cdata-string cdata) > (string-append "<![CDATA[" (cdata-chars cdata) "]]>")) > (define c1 (cdata #f #f "cdata1")) > (define c2 (cdata #f #f "<![CDATA[cdata2]]>")) > (printf "c1: ~a & ~a~%" (cdata-chars c1) (cdata-string c1)) > (printf "c2: ~a & ~a~%" (cdata-chars c2) (cdata-string c2)) => > c1: cdata1 & <![CDATA[cdata1]]> > c2: cdata2 & <![CDATA[cdata2]]> > This would entail corresponding changes to the XML writer, but would be > coherent and backward compatible, in the sense that something that was > illegal before would become legal, but nothing hitherto legal would become > illegal. > >> Secondly, it represents low-level syntactical information which > >> should not, I think, be present in the result of a parse of an XML > >> document. The fact that the content string originated from within a > >> CDATA section is, I think, useful to know, but only just. Note that > >> the fact that a string or character originated within a CDATA > >> section is not part of the XML information set > >> (<http://www.w3.org/TR/xml-infoset/> Sect. 2.6, and Appx D point > >> 19). Supposing (which would be sturdily defensible) that xexprs > >> should represent no more than the content of the XML information > >> set, then there would be no need for the cdata structure at all > >> (though this obviously makes escaping characters on output somewhat > >> more involved). > > > > I'm happy making the backwards compatible change of changing the > > reader to never produce them. > Right, so parsing "<p>Foo <![CDATA[b&r<>]]> baz</p>" would produce > (list 'p '() "Foo " "b&r<>" " baz") > or > (list 'p '() "Foo b&r<> baz") > The only arguable downside to this is that the presence of a #<cdata> > structure gives the caller a hint that there's something that (someone > thought) needs escaping here. However, if they're being as careful as they > should be about escaping before outputting, then this won't make any > difference. > >> It's also completely counterintuitive: the documentation of this > >> struct is only three sentences long, and when reading it I _still_ > >> managed to elide the explanation that the CDATA line-noise actually > >> had to be included in the string, presumaly because it seemed so > >> obvious that it wouldn't. > > > > The sentence is there because it is non-intuitive. I don't know any > > other way to say it. The XML collect doesn't insert the wrapper, it > > assumes it is already there. > Perhaps a big "NOTE:" at the beginning of the second paragraph would draw > attention to it. > >> Side-issue regarding the wording of the documentation: it's not > >> completely clear what "proper quoting of content" means. I presume > >> it means purely racket-quoting of the string contents, and doesn't > >> refer to XML quoting at all. Thus (cdata #f #f "<![CDATA[\"&]]>") > >> would be acceptable in principle (it is acceptable in fact). > > > > It refers to the fact that "]]>" cannot appear in the content. > We may be at cross-purposes, then, but it's still not clear what "proper > quoting" refers to, since there's no scope for quoting the contents of CDATA > sections. If you want to include "]]>" within/near a CDATA section (perhaps > you're writing about CDATA sections, or you have a taste for esoteric > smilies: 8]]> "gleeful person with handlebar moustache"), then you'd have to > do something like <![CDATA[esoteric smilie: 8]]]]><![CDATA[> "gleeful"]]> > I think it would be reasonable for write-xexpr and friends to simply throw an > error if they find a "]]>" in CDATA content, leaving it up to the creator of > the xexpr to handle this corner case themself. > >> Is there any chance of a (admittedly backward-incompatible) change > >> to this part of the interface? I doubt that the cdata structure is > >> very extensively used. > > > > I believe its main use is in including Javascript output where XML > > quoting will cause stuff like "1 < 2" to fail to compile in most > > browsers. In that case, it is very important that the CDATA tags not > > be there (i.e. we WANT invalid XML) because browsers will break on > > that too. > That's the broad sort of situation where I'm using it. Looking at Eli's > Javascript example, I think that's a case where the module can properly leave > such two-language-at-a-time hacking to the (poor) author, and blithely output > <![CDATA[...]]> in all cases. > Best wishes, > Norman > -- > Norman Gray : http://nxg.me.uk -- Jay McCarthy <jay.mccar...@gmail.com> Assistant Professor / Brigham Young University http://faculty.cs.byu.edu/~jay "The glory of God is Intelligence" - D&C 93 ____________________ Racket Users list: http://lists.racket-lang.org/users