Hi, There are many occurrences of the CIK number in the page source. This pulls out the first node containing it:
node <- getNodeSet(doc[[1]], "//link[@rel='alternate']" ) >From there you can extract the number. Here's one way to do it. strsplit(strsplit(unlist(node)[[5]], "CIK=")[[1]][2], "&type")[[1]][1] Jeff On Wed, Aug 14, 2013 at 1:34 PM, Sparks, John James <jspa...@uic.edu> wrote: > Dear R Helpers, > > I would like to pull the CIK number from the web page > > > http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany > > If you put this web page into your browser you will see the CIK number in > red on the left side of the page near the top. > > When I try the basic > require(scrapeR) > require(XML) > require(RCurl) > doc > <-htmlTreeParse(" > http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany > ") > str(doc) > > I get a large number of items in the data frame that I don't know how to > interpret. Both > tables <- readHTMLTable(doc) > > and > > list<-xmlToList(doc) > > result in errors. > > Any (positive) guidance would be much appreciated. > > --John J. Sparks, Ph.D. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.