-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
David Winsemius wrote: > On 15 Dec 2007, you wrote in gmane.comp.lang.r.general: > >> If we can assume that the abstract is always the 4th paragraph then we >> can try something like this: >> >> library(XML) >> doc <- >> xmlTreeParse("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?rss >> _guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-", isURL = TRUE, >> useInternalNodes = TRUE, trim = TRUE) >> >> out <- cbind( >> Author = unlist(xpathApply(doc, "//author", xmlValue)), >> PMID = gsub(".*:", "", unlist(xpathApply(doc, "//guid", >> xmlValue))), >> Abstract = unlist(xpathApply(doc, "//description", >> function(x) { >> on.exit(free(doc2)) >> doc2 <- htmlTreeParse(xmlValue(x)[[1]], asText = TRUE, >> useInternalNodes = TRUE, trim = TRUE) >> xpathApply(doc2, "//p[4]", xmlValue) >> } >> ))) >> free(doc) >> substring(out, 1, 25) # display first 25 chars of each field >> >> >> The last line produces (it may look messed up in this email): >> >>> substring(out, 1, 25) # display it >> Author PMID Abstract > [1,] " Goon P, Sonnex C, Jani P" "18046565" "Human papillomaviruses (H" > [2,] " Rad MH, Alizadeh E, Ilkh" "17978930" "Recurrent laryngeal papil" > [3,] " Lee LA, Cheng AJ, Fang T" "17975511" "OBJECTIVES:: Papillomas o" > [4,] " Gerein V, Schmandt S, Ba" "17935912" "BACKGROUND: Human papillo" > snip >> > > It looked beautifully regular in my newsreader. It is helpful to see an > example showing the indexed access to nodes. It was also helpful to see the > example of substring for column display. Thank you (for this and all of > your other contributions.) > > I find upon further browsing that the pmfetch access point is obsolete. > Experimentation with the PubMed eFetch server access point results in fully > xml-tagged results: > > e.fetch.doc<- function (){ > fetch.stem <- > "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?" > src.mode <- "db=pubmed&retmode=xml&" > request <- "id=11045395" > doc<-xmlTreeParse(paste(fetch.stem,src.mode,request,sep=""), > isURL = TRUE, useInternalNodes = TRUE) > } > # in the debugging phase I needed to set useInternalNodes = TRUE to see the > tags. Never did find a way to "print" them when internal. saveXML(node) will return a string giving the XML content of that node as tree. > > doc<-e.fetch.doc() > get.info<- function(doc){ > df<-cbind( > Abstract = unlist(xpathApply(doc, "//AbstractText", xmlValue)), > Journal = unlist(xpathApply(doc, "//Title", xmlValue)), > Pmid = unlist(xpathApply(doc, "//PMID", xmlValue)) > ) > return(df) > } > > # this works >> substring(get.info(doc), 1, 25) > Abstract Journal Pmid > [1,] "We studied the prevalence" "Pediatric nephrology (Ber" "11045395" > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHZcKo9p/Jzwa2QP4RAnu3AJ9ucFyb17rm48PLQaPTw6VWyrZWSQCdG0rT zdLB6mkNPFh5lWgNgb70sDc= =SR2E -----END PGP SIGNATURE----- ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.