I apologize for the multiple posting then, it's just that I received those emails saying that my post was awaiting approval and more than four days went by without news. Sorry for the lack of patience.
Thank you very much, Ben. Indeed that's how I've been doing it so far, but I have accrued too many reasons not to work with the XML object any more and move all my coding to a list formulation. I wonder what you mean with > [...] but I find what you try below works if you specify useInternalNodes = > TRUE in your invocation of xmlTreeParse Actually, the output error that I included happens when I use useInternalNodes=T (my bad). If I use useInternalNodes=F I get > mylist[["ports"]][[2]][["service"]]$.attrs["name"] NULL The useInternalNodes clause has proven fatally dangerous for me before. If I parse a tree with useInternalNodes=T, save the workspace, close R and reopen it, load the workspace and try to read the tree, it will completely crash my computer, which has already cost me too many lost days of work. On the other hand, useInternalNodes=F will result in any xml operation being ridiculously slow. So the intention was to move everything to a more R-friendly object like a list. Any tips? Best, Santiago 2013/4/16 Ben Tupper <btup...@bigelow.org>: > Hi, > > On Apr 16, 2013, at 2:49 PM, santiago gil wrote: >> >> 2013/4/14 santiago gil <sg.c...@gmail.com>: >>> Hello all, >>> >>> I have a problem with the way attributes are dealt with in the >>> function xmlToList(), and I haven't been able to figure it out for >>> days now. >>> > > I have not used xmlToList(), but I find what you try below works if you > specify useInternalNodes = TRUE in your invocation of xmlTreeParse. Often > that is the solution for many issues with xml. Also, I have found it best to > write a relatively generic getter style function. So, in the example below I > have written a function called getPortAttr - it will get attributes for the > child node you name. I used your example as the defaults: "service" is the > child to query and "name" is the attribute to retrieve from that child. It's > a heck of a lot easier to write a function than building the longish parse > strings with lots of [[this]][[and]][[that]] stuff, and it is reusable to > boot. > > Cheers, > Ben > > library(XML) > > mydoc <- '<host starttime="1365204834" endtime="1365205860"> > <status state="up" reason="echo-reply" reason_ttl="127"/> > <address addr="XXX.XXX.XXX.XXX" addrtype="ipv4"/> > <ports> > <port protocol="tcp" portid="135"> > <state state="open" reason="syn-ack" reason_ttl="127"/> > <service name="msrpc" product="Microsoft Windows RPC" ostype="Windows" > method="probed" conf="10"> > <cpe>cpe:/o:microsoft:windows</cpe> > </service> > </port> > <port protocol="tcp" portid="139"> > <state state="open" reason="syn-ack" reason_ttl="127"/> > <service name="netbios-ssn" method="probed" conf="10"/> > </port> > </ports> > <times srtt="647" rttvar="71" to="100000"/> > </host>' > > mytree<-xmlTreeParse(mydoc, useInternalNodes = TRUE) > myroot<-xmlRoot(mytree) > > myports <- myroot[["ports"]]["port"] > > > getPortAttr <- function(x, child = "service", attr = "name") { > kid <- x[[child]] > att <- xmlAttrs(kid)[[attr]] > att > } > portNames <- sapply(myports, getPortAttr) > #> portNames > # port port > # "msrpc" "netbios-ssn" > portReason <- sapply(myports, getPortAttr, child = "state", attr = "reason") > #> portReason > # port port > #"syn-ack" "syn-ack" > > > > > > > > > > >>> Say I have a document (produced by nmap) like this: >>> >>>> mydoc <- '<host starttime="1365204834" endtime="1365205860"><status >>>> state="up" reason="echo-reply" reason_ttl="127"/> >>> <address addr="XXX.XXX.XXX.XXX" addrtype="ipv4"/> >>> <ports><port protocol="tcp" portid="135"><state state="open" >>> reason="syn-ack" reason_ttl="127"/><service name="msrpc" >>> product="Microsoft Windows RPC" ostype="Windows" method="probed" >>> conf="10"><cpe>cpe:/o:microsoft:windows</cpe></service></port> >>> <port protocol="tcp" portid="139"><state state="open" >>> reason="syn-ack" reason_ttl="127"/><service name="netbios-ssn" >>> method="probed" conf="10"/></port> >>> </ports> >>> <times srtt="647" rttvar="71" to="100000"/> >>> </host>' >>> >>> I want to store this as a list of lists, so I do: >>> >>> mytree<-xmlTreeParse(mydoc) >>> myroot<-xmlRoot(mytree) >>> mylist<-xmlToList(myroot) >>> >>> Now my problem is that when I want to fetch the attributes of the >>> services running of each port, the behavior is not consistent: >>> >>>> mylist[["ports"]][[1]][["service"]]$.attrs["name"] >>> name >>> "msrpc" >>>> mylist[["ports"]][[2]][["service"]]$.attrs["name"] >>> Error in trash_list[["ports"]][[2]][["service"]]$.attrs : >>> $ operator is invalid for atomic vectors >>> >>> I understand that the way they are dfined in the documnt is not the >>> same, but I think there still should be a consistent behavior. I've >>> tried many combination of parameters for xmlTreeParse() but nothing >>> has helped me. I can't find a way to call up the name of the service >>> consistently regardless of whether the node has children or not. Any >>> tips? >>> >>> All the best, >>> >>> >>> S.G. >>> >>> -- >>> ------------------------------------------------------------------------------- >>> http://barabasilab.neu.edu/people/gil/ >> >> >> >> -- >> ------------------------------------------------------------------------------- >> http://barabasilab.neu.edu/people/gil/ >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > Ben Tupper > Bigelow Laboratory for Ocean Sciences > 60 Bigelow Drive, P.O. Box 380 > East Boothbay, Maine 04544 > http://www.bigelow.org > > > > > > > > -- ------------------------------------------------------------------------------- http://barabasilab.neu.edu/people/gil/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.