Here's another way, using CDATA to represent band names. (Searching around, your question seems to be a common problem [1,2] and this was one of the suggested solutions.)
*Disclaimer: I haven't worked with xml much, so maybe your/Jay's way is preferred. #lang racket (require xml xml/path) (define (cdata->string cd) (second (regexp-match #px"^<!\\[CDATA\\[(.*)\\]\\]>$" (cdata-string cd)))) (map cdata->string (se-path*/list '(name) (xml->xexpr (read-xml/element (open-input-string "<bands><name><![CDATA[Derek & the Dominos]]></name> <name><![CDATA[Nick Cave & the Bad Seeds]]></name></bands>"))))) [1]: http://forums.asp.net/t/1340605.aspx [2]: http://stackoverflow.com/questions/1654674/url-and-the-ampersand On Sun, Dec 15, 2013 at 12:52 PM, Giacomo Ritucci <giacomo.ritu...@gmail.com> wrote: > Thanks Jay, string-append* is really handy here. > > Another hint came from Matthew Butterick that pointed me to a message from > Matthias Felleisen that suggested to use match > (http://lists.racket-lang.org/users/archive/2013-June/058426.html) > > I experimented a bit with the following example that combines a simple but > not trivial XML structure, whitespace and entities: > > https://gist.github.com/rjack/7968318 > > (Any feedback is highly appreciated. For example, Jay mentioning > string-append* allowed me to get rid of all (apply string-append ...)) > > Honestly, my first thought has been "That's a overly difficult approach to a > simple query on XML data". > > Thoughts: > > 1. eliminate-whitespace was key to successfully use match, I wish I found it > earlier > 2. match patterns and list operations are really difficult to read (and > write) compared to the equivalent xpath expression > 3. it would be great if the XML library could provide helper functions > (something like xe->string and xe-string=?) > > Is there some interest to polish this example so it can be turned into a > tutorial or a guide for the Racket XML library documentation? From a newbie > point of view this way of querying XML is not obvious. > > Feedback, fixes and suggestions are highly appreciated. > > Thanks again, > Giacomo > > > > On Tue, Dec 10, 2013 at 12:45 AM, Jay McCarthy <jay.mccar...@gmail.com> > wrote: >> >> Hi Giacomo, >> >> I think I would do this: >> >> (define (xe->string n) >> (string-append* (rest (rest n)))) >> >> (check-equal? (map xe->string (se-path*/list '(bands) xe)) >> '("Derek & the Dominos" "Nick Cave & the Bad Seeds")) >> >> Because you want the children of "bands" and you want to turn each one >> into a string. >> >> >> On Sat, Dec 7, 2013 at 6:30 PM, Giacomo Ritucci >> <giacomo.ritu...@gmail.com> wrote: >> > Hi Jay, >> > >> > thanks for your reply. >> > >> > Unfortunately I can't find a way in my code to detect that in the >> > resulting >> > list from se-path*/list >> > >> > >> > '("Derek " "&" " the Dominos" "Nick Cave " "&" " the Bad Seeds") >> > >> > the first three elements should be actually treated as a single string >> > and >> > so the last three. >> > >> > Is there a common idiom in Racket to extract a list of values from an >> > XML >> > collection, in a way that works with & and other entities? >> > >> > Thanks in advance. >> > >> > >> > On Mon, Dec 2, 2013 at 9:27 PM, Jay McCarthy <jay.mccar...@gmail.com> >> > wrote: >> >> >> >> Hi Giacomo, >> >> >> >> First, the question is not really about se/list, because if you look >> >> at the xexpr you're giving it, the "name" node has three string >> >> children: >> >> >> >> '(bands () (name () "Derek " "&" " the Dominos") (name () "Nick Cave " >> >> "&" " the Bad Seeds")) >> >> >> >> And se/list* gives you these children all appended together. If you >> >> got the name nodes themselves, then you could concatenate their >> >> children. >> >> >> >> Second, there real question is about why parsing XML works like that. >> >> If you look at this: >> >> >> >> (define xs >> >> "<bands><name>Derek & the Dominos</name><name>Nick Cave & >> >> the Bad Seeds</name></bands>") >> >> (define x >> >> (read-xml/document (open-input-string xs))) >> >> x >> >> >> >> Then you'll see that the core is that name doesn't have a single piece >> >> of PCDATA. It has three, one of which is an entity. >> >> >> >> I don't consider this an error in the XML parser, but a consequence of >> >> XML entities that might not be obvious: they are their only nodes in >> >> the list of children of the parent node. >> >> >> >> Jay >> >> >> >> >> >> On Sun, Dec 1, 2013 at 8:36 AM, Giacomo Ritucci >> >> <giacomo.ritu...@gmail.com> wrote: >> >> > Hi Racket Users, >> >> > >> >> > I'm using se-path*/list to extract values from an XML collection but >> >> > I >> >> > found >> >> > a strange behaviour when the extracted values contain entities. >> >> > >> >> > For example, given the following XML: >> >> > >> >> > <bands> >> >> > <name>Derek & the Dominos</name> >> >> > <name>Nick Cave & the Bad Seeds</name> >> >> > </bands> >> >> > >> >> > when I extract a list of band names with (se-path*/list '(name) xe) >> >> > I'd >> >> > expect this result: >> >> > >> >> > '("Derek & the Dominos" "Nick Cave & the Bad Seeds") >> >> > >> >> > but what I actually receive is: >> >> > >> >> > '("Derek " "&" " the Dominos" "Nick Cave " "&" " the Bad Seeds") >> >> > >> >> > Is this the intended behaviour? How can I overcome this and make >> >> > se-path*/list return one string for tag? >> >> > >> >> > Here's my test code, I'm running Racket v5.3.6 on Linux x86_64 and >> >> > maybe >> >> > I'm >> >> > doing overlooking something because I'm new to Racket. >> >> > >> >> > Thank you in advance! >> >> > >> >> > Best regards, >> >> > Giacomo >> >> > >> >> > #lang racket >> >> > >> >> > (require xml >> >> > xml/path) >> >> > >> >> > (define xe (string->xexpr "<bands><name>Derek & the >> >> > Dominos</name><name>Nick Cave & the Bad Seeds</name></bands>")) >> >> > >> >> > (module+ test >> >> > (require rackunit) >> >> > >> >> > ;; what I get >> >> > (check-equal? (se-path*/list '(name) xe) >> >> > '("Derek " "&" " the Dominos" "Nick Cave " "&" " the >> >> > Bad >> >> > Seeds")) >> >> > >> >> > ;; what I'd expect >> >> > (check-equal? (se-path*/list '(name) xe) >> >> > '("Derek & the Dominos" "Nick Cave & the Bad >> >> > Seeds"))) >> >> > >> >> > ____________________ >> >> > Racket Users list: >> >> > http://lists.racket-lang.org/users >> >> > >> >> >> >> >> >> >> >> -- >> >> Jay McCarthy <j...@cs.byu.edu> >> >> Assistant Professor / Brigham Young University >> >> http://faculty.cs.byu.edu/~jay >> >> >> >> "The glory of God is Intelligence" - D&C 93 >> > >> > >> >> >> >> -- >> Jay McCarthy <j...@cs.byu.edu> >> Assistant Professor / Brigham Young University >> http://faculty.cs.byu.edu/~jay >> >> "The glory of God is Intelligence" - D&C 93 > > > > ____________________ > Racket Users list: > http://lists.racket-lang.org/users > ____________________ Racket Users list: http://lists.racket-lang.org/users