Re: [R] Grap Element from Web Page

Jeffrey Dick Wed, 14 Aug 2013 02:21:37 -0700

Hi,

There are many occurrences of the CIK number in the page source. This pulls
out the first node containing it:


node <- getNodeSet(doc[[1]], "//link[@rel='alternate']" )

>From there you can extract the number. Here's one way to do it.

strsplit(strsplit(unlist(node)[[5]], "CIK=")[[1]][2], "&type")[[1]][1]

Jeff


On Wed, Aug 14, 2013 at 1:34 PM, Sparks, John James <[email protected]> wrote:

> Dear R Helpers,
>
> I would like to pull the CIK number from the web page
>
>
> http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany
>
> If you put this web page into your browser you will see the CIK number in
> red on the left side of the page near the top.
>
> When I try the basic
> require(scrapeR)
> require(XML)
> require(RCurl)
> doc
> <-htmlTreeParse("
> http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany
> ")
> str(doc)
>
> I get a large number of items in the data frame that I don't know how to
> interpret.  Both
> tables <- readHTMLTable(doc)
>
> and
>
> list<-xmlToList(doc)
>
> result in errors.
>
> Any (positive) guidance would be much appreciated.
>
> --John J. Sparks, Ph.D.
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grap Element from Web Page

Reply via email to