Hi, On Apr 16, 2013, at 6:39 PM, santiago gil wrote: > > Thank you very much, Ben. Indeed that's how I've been doing it so far, > but I have accrued too many reasons not to work with the XML object > any more and move all my coding to a list formulation. > > I wonder what you mean with > >> [...] but I find what you try below works if you specify useInternalNodes = >> TRUE in your invocation of xmlTreeParse > > Actually, the output error that I included happens when I use > useInternalNodes=T (my bad).
My bad right back at you. It doesn't work here now (and didn't before I guess). I can't explain why xmlToList splits the two nodes so differently. That's another good reason for me to shy away from it. > If I use useInternalNodes=F I get > >> mylist[["ports"]][[2]][["service"]]$.attrs["name"] > NULL > > The useInternalNodes clause has proven fatally dangerous for me > before. If I parse a tree with useInternalNodes=T, save the workspace, > close R and reopen it, load the workspace and try to read the tree, it > will completely crash my computer, which has already cost me too many > lost days of work. On the other hand, useInternalNodes=F will result > in any xml operation being ridiculously slow. So the intention was to > move everything to a more R-friendly object like a list. My experience with the XML package seems to be quite different from yours regarding useInternalNodes = TRUE/FALSE. I get satisfactory and stable performance with useInternalNodes = TRUE, so your experience is very puzzling to me. I never save workspaces - heck, I'm not sure what XML does with the external pointers in that case. Can you save an address and expect to get the same address later? Instead I save the xml formed data using saveXML which dumps to a nicely formed text file.. I guess I'm not much help! You might want to contact the maintainer of XML with a small example, such as the one you posted. He has been very responsive and help to me in the past. Cheers, Ben > Best, > > > Santiago > > 2013/4/16 Ben Tupper <btup...@bigelow.org>: >> Hi, >> >> On Apr 16, 2013, at 2:49 PM, santiago gil wrote: >>> >>> 2013/4/14 santiago gil <sg.c...@gmail.com>: >>>> Hello all, >>>> >>>> I have a problem with the way attributes are dealt with in the >>>> function xmlToList(), and I haven't been able to figure it out for >>>> days now. >>>> >> >> I have not used xmlToList(), but I find what you try below works if you >> specify useInternalNodes = TRUE in your invocation of xmlTreeParse. Often >> that is the solution for many issues with xml. Also, I have found it best >> to write a relatively generic getter style function. So, in the example >> below I have written a function called getPortAttr - it will get attributes >> for the child node you name. I used your example as the defaults: "service" >> is the child to query and "name" is the attribute to retrieve from that >> child. It's a heck of a lot easier to write a function than building the >> longish parse strings with lots of [[this]][[and]][[that]] stuff, and it is >> reusable to boot. >> >> Cheers, >> Ben >> >> library(XML) >> >> mydoc <- '<host starttime="1365204834" endtime="1365205860"> >> <status state="up" reason="echo-reply" reason_ttl="127"/> >> <address addr="XXX.XXX.XXX.XXX" addrtype="ipv4"/> >> <ports> >> <port protocol="tcp" portid="135"> >> <state state="open" reason="syn-ack" reason_ttl="127"/> >> <service name="msrpc" product="Microsoft Windows RPC" ostype="Windows" >> method="probed" conf="10"> >> <cpe>cpe:/o:microsoft:windows</cpe> >> </service> >> </port> >> <port protocol="tcp" portid="139"> >> <state state="open" reason="syn-ack" reason_ttl="127"/> >> <service name="netbios-ssn" method="probed" conf="10"/> >> </port> >> </ports> >> <times srtt="647" rttvar="71" to="100000"/> >> </host>' >> >> mytree<-xmlTreeParse(mydoc, useInternalNodes = TRUE) >> myroot<-xmlRoot(mytree) >> >> myports <- myroot[["ports"]]["port"] >> >> >> getPortAttr <- function(x, child = "service", attr = "name") { >> kid <- x[[child]] >> att <- xmlAttrs(kid)[[attr]] >> att >> } >> portNames <- sapply(myports, getPortAttr) >> #> portNames >> # port port >> # "msrpc" "netbios-ssn" >> portReason <- sapply(myports, getPortAttr, child = "state", attr = "reason") >> #> portReason >> # port port >> #"syn-ack" "syn-ack" >> >> >> >> >> >> >> >> >> >> >>>> Say I have a document (produced by nmap) like this: >>>> >>>>> mydoc <- '<host starttime="1365204834" endtime="1365205860"><status >>>>> state="up" reason="echo-reply" reason_ttl="127"/> >>>> <address addr="XXX.XXX.XXX.XXX" addrtype="ipv4"/> >>>> <ports><port protocol="tcp" portid="135"><state state="open" >>>> reason="syn-ack" reason_ttl="127"/><service name="msrpc" >>>> product="Microsoft Windows RPC" ostype="Windows" method="probed" >>>> conf="10"><cpe>cpe:/o:microsoft:windows</cpe></service></port> >>>> <port protocol="tcp" portid="139"><state state="open" >>>> reason="syn-ack" reason_ttl="127"/><service name="netbios-ssn" >>>> method="probed" conf="10"/></port> >>>> </ports> >>>> <times srtt="647" rttvar="71" to="100000"/> >>>> </host>' >>>> >>>> I want to store this as a list of lists, so I do: >>>> >>>> mytree<-xmlTreeParse(mydoc) >>>> myroot<-xmlRoot(mytree) >>>> mylist<-xmlToList(myroot) >>>> >>>> Now my problem is that when I want to fetch the attributes of the >>>> services running of each port, the behavior is not consistent: >>>> >>>>> mylist[["ports"]][[1]][["service"]]$.attrs["name"] >>>> name >>>> "msrpc" >>>>> mylist[["ports"]][[2]][["service"]]$.attrs["name"] >>>> Error in trash_list[["ports"]][[2]][["service"]]$.attrs : >>>> $ operator is invalid for atomic vectors >>>> >>>> I understand that the way they are dfined in the documnt is not the >>>> same, but I think there still should be a consistent behavior. I've >>>> tried many combination of parameters for xmlTreeParse() but nothing >>>> has helped me. I can't find a way to call up the name of the service >>>> consistently regardless of whether the node has children or not. Any >>>> tips? >>>> >>>> All the best, >>>> >>>> >>>> S.G. >>>> >>>> -- >>>> ------------------------------------------------------------------------------- >>>> http://barabasilab.neu.edu/people/gil/ >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------------- >>> http://barabasilab.neu.edu/people/gil/ >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> Ben Tupper >> Bigelow Laboratory for Ocean Sciences >> 60 Bigelow Drive, P.O. Box 380 >> East Boothbay, Maine 04544 >> http://www.bigelow.org >> >> >> >> >> >> >> >> > > > > -- > ------------------------------------------------------------------------------- > http://barabasilab.neu.edu/people/gil/ Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.