Re: [R] [R-sig-DB] Reading data from a worksheet on the Internet

Nilza BARROS Sun, 12 Feb 2012 10:26:38 -0800

Hi,

I really appreciate your help. I definitively need a reusable program since
I have been asking to  someone to extract these data from the Internet
everyday.  That's the reason why I am trying to do a program to do that
Related to the url I sent, I have just realized that although I had written
 the one related to only worksheet (PLANILHA2) when I copy it to my browse
it is showed the link with both worksheets.



I am going to read about Rcurl and XML libraries but I hope you can help me
too.

Thanks in advance
Nilza Barros


On Sun, Feb 12, 2012 at 10:42 AM, CIURANA EUGENE (R) <r.u...@ciurana.eu>wrote:

> **
>
> On Sat, 11 Feb 2012 22:49:07 -0200, Nilza BARROS wrote:
>
> I have to read data from a worksheet that is available on the Internet. I
> have been doing this by copying the worksheet from the browser.
> But I would like to be able to copy the data automatically using the url
> command.
>
> But when using  "url" command the result is the source code, I mean, a html
> code.
> I see that the data I need is in the source code but before thinking about
> reading the data from the html code I wonder if there is a package or
> anoher way to extract these data since reading  from the code will demand
> many work and it can be not so accurate.
>
> Below one can see the from where I am trying to export the data:
>
> dadoshttp://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1201_arquivos/sheet002.htm","r
> ")
>
>
>
> Hi Nilza,
>
> The URL that you posted points at a document that has another document
> within it, in a frame.  These files are Excel dumps into HTML.  To view the
> actual data you need the URIs for each data set.  Those appear at the
> bottom of the listing, under sc1201_arquivos/sheet001.htm and sheet002.htm.
>  Your code must fetch these files, not the one at
> http://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1202.htm which
> only "wraps" them.  Most of what you see on the file that you linked isn't
> HTML - it's JavaScript and style information for the data living on the two
> separate HTML documents.
>
> You can do this in R using the RCurl and XML libraries, by pulling the
> specific files for each data source.  If this is a one-time thing, I'd
> suggest just coding something simple that loads the data for each file.  If
> this is something you'll execute periodically, you'll need a bit more code
> to extract the internal data sheets (e.g. the "planhilas" at the bottom),
> then extracting the actual data.
>
> Let me know if you want this as a one-time thing, or as a reusable
> program.  If you don't know how to use RCurl and XML to parse HTML I'll be
> happy to help with that too.  I'd just like to know more about the scope of
> your question.
>
> Cheers,
>
> pr3d
>
> --
> pr3d4t0r at #R, ##java, #awk, #pytonirc.freeenode.net
>
>


-- 
Abraço,
Nilza Barros

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R-sig-DB] Reading data from a worksheet on the Internet

Reply via email to