Re: [R] [R-sig-DB] Reading data from a worksheet on the Internet

Henrique Dallazuanna Sun, 12 Feb 2012 10:37:15 -0800

Try the readHTMLTable function in package XML:

sheet2 <- readHTMLTable("
http://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1201_arquivos/sheet002.htm";,
skip.rows = 2)


head(sheet2[[1]])

On Sun, Feb 12, 2012 at 4:24 PM, Nilza BARROS <nilzabar...@gmail.com> wrote:

> Hi,
>
> I really appreciate your help. I definitively need a reusable program since
> I have been asking to  someone to extract these data from the Internet
> everyday.  That's the reason why I am trying to do a program to do that
> Related to the url I sent, I have just realized that although I had written
>  the one related to only worksheet (PLANILHA2) when I copy it to my browse
> it is showed the link with both worksheets.
>
>
> I am going to read about Rcurl and XML libraries but I hope you can help me
> too.
>
> Thanks in advance
> Nilza Barros
>
>
> On Sun, Feb 12, 2012 at 10:42 AM, CIURANA EUGENE (R) <r.u...@ciurana.eu
> >wrote:
>
> > **
> >
> > On Sat, 11 Feb 2012 22:49:07 -0200, Nilza BARROS wrote:
> >
> > I have to read data from a worksheet that is available on the Internet. I
> > have been doing this by copying the worksheet from the browser.
> > But I would like to be able to copy the data automatically using the url
> > command.
> >
> > But when using  "url" command the result is the source code, I mean, a
> html
> > code.
> > I see that the data I need is in the source code but before thinking
> about
> > reading the data from the html code I wonder if there is a package or
> > anoher way to extract these data since reading  from the code will demand
> > many work and it can be not so accurate.
> >
> > Below one can see the from where I am trying to export the data:
> >
> > dadoshttp://
> www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1201_arquivos/sheet002.htm
> ","r
> > ")
> >
> >
> >
> > Hi Nilza,
> >
> > The URL that you posted points at a document that has another document
> > within it, in a frame.  These files are Excel dumps into HTML.  To view
> the
> > actual data you need the URIs for each data set.  Those appear at the
> > bottom of the listing, under sc1201_arquivos/sheet001.htm and
> sheet002.htm.
> >  Your code must fetch these files, not the one at
> > http://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1202.htm which
> > only "wraps" them.  Most of what you see on the file that you linked
> isn't
> > HTML - it's JavaScript and style information for the data living on the
> two
> > separate HTML documents.
> >
> > You can do this in R using the RCurl and XML libraries, by pulling the
> > specific files for each data source.  If this is a one-time thing, I'd
> > suggest just coding something simple that loads the data for each file.
>  If
> > this is something you'll execute periodically, you'll need a bit more
> code
> > to extract the internal data sheets (e.g. the "planhilas" at the bottom),
> > then extracting the actual data.
> >
> > Let me know if you want this as a one-time thing, or as a reusable
> > program.  If you don't know how to use RCurl and XML to parse HTML I'll
> be
> > happy to help with that too.  I'd just like to know more about the scope
> of
> > your question.
> >
> > Cheers,
> >
> > pr3d
> >
> > --
> > pr3d4t0r at #R, ##java, #awk, #pytonirc.freeenode.net
> >
> >
>
>
> --
> Abraço,
> Nilza Barros
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R-sig-DB] Reading data from a worksheet on the Internet

Reply via email to