Hi, On Apr 16, 2013, at 2:49 PM, santiago gil wrote: > > 2013/4/14 santiago gil <sg.c...@gmail.com>: >> Hello all, >> >> I have a problem with the way attributes are dealt with in the >> function xmlToList(), and I haven't been able to figure it out for >> days now. >>
I have not used xmlToList(), but I find what you try below works if you specify useInternalNodes = TRUE in your invocation of xmlTreeParse. Often that is the solution for many issues with xml. Also, I have found it best to write a relatively generic getter style function. So, in the example below I have written a function called getPortAttr - it will get attributes for the child node you name. I used your example as the defaults: "service" is the child to query and "name" is the attribute to retrieve from that child. It's a heck of a lot easier to write a function than building the longish parse strings with lots of [[this]][[and]][[that]] stuff, and it is reusable to boot. Cheers, Ben library(XML) mydoc <- '<host starttime="1365204834" endtime="1365205860"> <status state="up" reason="echo-reply" reason_ttl="127"/> <address addr="XXX.XXX.XXX.XXX" addrtype="ipv4"/> <ports> <port protocol="tcp" portid="135"> <state state="open" reason="syn-ack" reason_ttl="127"/> <service name="msrpc" product="Microsoft Windows RPC" ostype="Windows" method="probed" conf="10"> <cpe>cpe:/o:microsoft:windows</cpe> </service> </port> <port protocol="tcp" portid="139"> <state state="open" reason="syn-ack" reason_ttl="127"/> <service name="netbios-ssn" method="probed" conf="10"/> </port> </ports> <times srtt="647" rttvar="71" to="100000"/> </host>' mytree<-xmlTreeParse(mydoc, useInternalNodes = TRUE) myroot<-xmlRoot(mytree) myports <- myroot[["ports"]]["port"] getPortAttr <- function(x, child = "service", attr = "name") { kid <- x[[child]] att <- xmlAttrs(kid)[[attr]] att } portNames <- sapply(myports, getPortAttr) #> portNames # port port # "msrpc" "netbios-ssn" portReason <- sapply(myports, getPortAttr, child = "state", attr = "reason") #> portReason # port port #"syn-ack" "syn-ack" >> Say I have a document (produced by nmap) like this: >> >>> mydoc <- '<host starttime="1365204834" endtime="1365205860"><status >>> state="up" reason="echo-reply" reason_ttl="127"/> >> <address addr="XXX.XXX.XXX.XXX" addrtype="ipv4"/> >> <ports><port protocol="tcp" portid="135"><state state="open" >> reason="syn-ack" reason_ttl="127"/><service name="msrpc" >> product="Microsoft Windows RPC" ostype="Windows" method="probed" >> conf="10"><cpe>cpe:/o:microsoft:windows</cpe></service></port> >> <port protocol="tcp" portid="139"><state state="open" >> reason="syn-ack" reason_ttl="127"/><service name="netbios-ssn" >> method="probed" conf="10"/></port> >> </ports> >> <times srtt="647" rttvar="71" to="100000"/> >> </host>' >> >> I want to store this as a list of lists, so I do: >> >> mytree<-xmlTreeParse(mydoc) >> myroot<-xmlRoot(mytree) >> mylist<-xmlToList(myroot) >> >> Now my problem is that when I want to fetch the attributes of the >> services running of each port, the behavior is not consistent: >> >>> mylist[["ports"]][[1]][["service"]]$.attrs["name"] >> name >> "msrpc" >>> mylist[["ports"]][[2]][["service"]]$.attrs["name"] >> Error in trash_list[["ports"]][[2]][["service"]]$.attrs : >> $ operator is invalid for atomic vectors >> >> I understand that the way they are dfined in the documnt is not the >> same, but I think there still should be a consistent behavior. I've >> tried many combination of parameters for xmlTreeParse() but nothing >> has helped me. I can't find a way to call up the name of the service >> consistently regardless of whether the node has children or not. Any >> tips? >> >> All the best, >> >> >> S.G. >> >> -- >> ------------------------------------------------------------------------------- >> http://barabasilab.neu.edu/people/gil/ > > > > -- > ------------------------------------------------------------------------------- > http://barabasilab.neu.edu/people/gil/ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.