David Winsemius <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]:
> "Farrel Buchinsky" <[EMAIL PROTECTED]> wrote in > news:[EMAIL PROTECTED]: > >> On Dec 13, 2007 11:35 PM, Robert Gentleman <[EMAIL PROTECTED]> >> wrote: >>> or just try looking in the annotate package from Bioconductor >>> >> >> Yip. annotate seems to be the most streamlined way to do this. >> 1) How does one turn the list that is created into a dataframe whose >> column names are along the lines of date, title, journal, authors etc > > Gabor's example already did that task. > Actually the object returned by Gabor's method was a list of lists. Here is one way (probably very inefficient) of getting "doc" into a data.frame: colvals <-sapply(c("//title", "//author", "//category"), xpathApply, doc = doc, fun = xmlValue) titles=as.vector(unlist(colvals[1])[3:17]) # needed to drop extraneous titles for search name and an NCBI header #>str(colvals) #List of 3 # $ //title :List of 17 # ..$ : chr "PubMed: (\"Laryngeal Neoplasm..." # ..$ : chr "NCBI PubMed" authors=colvals[[2]] jrnls=colvals[[3]] # not sure why, but trying to do it in one step failed: # cites<-data.frame(titles=as.vector(unlist(colvals[1])[3:17]), # authors=colvals[[2]],jnrls=colvals[[3]]) # Error in data.frame(titles = as.vector(unlist(colvals[1])[3:17]), # authors = colvals[[2]], : # arguments imply differing number of rows: 15, 1 # but the following worked cites<-data.frame(titles=as.vector(titles)) cites$author<-authors cites$jrnls<-jrnls cites I am still wondering how to extract material that does not have an XML tag. Each item looks like: <item> <title>Gastroesophageal reflux in patients with recurrent laryngeal papillomatosis.</title> <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? tmpl=NoSidebarfile&db=PubMed&cmd=Retrieve&list_uids=17589729 &dopt=Abstract</link> <description> <![CDATA[ <table border="0" width="100%"><tr><td align="left"><a href="http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0034- 72992007000200011&lng=en&nrm=iso&tlng=en"><img src="http://www.ncbi.nlm.nih.gov/entrez/query/egifs/http:--www.scielo.br- img-scielo_en.gif" border="0"/></a> </td><td align="right"><a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? db=PubMed&cmd=Display&dopt=PubMed_PubMed&from_uid=17589729"> Related Articles</a></td></tr></table> <p><b>Gastroesophageal reflux in patients with recurrent laryngeal papillomatosis.</b></p> <p>Rev Bras Otorrinolaringol (Engl Ed). 2007 Mar-Apr;73(2):210-4 </p> <p>Authors: Pignatari SS, Liriano RY, Avelino MA, Testa JR, Fujita R, De Marco EK</p> <p>Evidence of a relation between gastroesophaeal reflux and pediatric respiratory disorders increases every year. Many respiratory symptoms and clinical conditions such as stridor, chronic cough, and recurrent pneumonia and bronchitis appear to be related to gastroesophageal reflux. Some studies have also suggested that gastroesophageal reflux may be associated with recurrent laryngeal papillomatosis, contributing to its recurrence and severity. AIM: the aim of this study was to verify the frequency and intensity of gastroesophageal reflux in children with recurrent laryngeal papillomatosis. MATERIAL AND METHODS: ten children of both genders, aged between 3 and 12 years, presenting laryngeal papillomatosis, were included in this study. The children underwent 24-hour double-probe pH- metry. RESULTS: fifty percent of the patients had evidence of gastroesophageal reflux at the distal sphincter; 90% presented reflux at the proximal sphincter. CONCLUSION: the frequency of proximal gastroesophageal reflux is significantly increased in patients with recurrent laryngeal papillomatosis.</p> <p>PMID: 17589729 [PubMed - in process]</p> ]]> </description> <author>Pignatari SS, Liriano RY, Avelino MA, Testa JR, Fujita R, De Marco EK</author> <category>Rev Bras Otorrinolaringol (Engl Ed)</category> <guid isPermaLink="false">PubMed:17589729</guid> </item> I would like to access, for instance, the PMID or the abstract within the <description> element, but I do not think that they have names in the the same way that <author> or <category> have xml named nodes. I suspect that getting the output in a different format, say as MEDLINE, might produce output that was tagged more completely. -- David Winsemius ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.