On Sat, 11 Feb 2012 22:49:07 -0200, Nilza BARROS wrote:
> I have to read data from a worksheet that is available on the Internet. I > have been doing this by copying the worksheet from the browser. > But I would like to be able to copy the data automatically using the url > command. > > But when using "url" command the result is the source code, I mean, a html > code. > I see that the data I need is in the source code but before thinking about > reading the data from the html code I wonder if there is a package or > anoher way to extract these data since reading from the code will demand > many work and it can be not so accurate. > > Below one can see the from where I am trying to export the data: > > dadoshttp://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1201_arquivos/sheet002.htm","r > ") Hi Nilza, The URL that you posted points at a document that has another document within it, in a frame. These files are Excel dumps into HTML. To view the actual data you need the URIs for each data set. Those appear at the bottom of the listing, under sc1201_arquivos/sheet001.htm and sheet002.htm. Your code must fetch these files, not the one at http://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1202.htm [1] which only "wraps" them. Most of what you see on the file that you linked isn't HTML - it's JavaScript and style information for the data living on the two separate HTML documents. You can do this in R using the RCurl and XML libraries, by pulling the specific files for each data source. If this is a one-time thing, I'd suggest just coding something simple that loads the data for each file. If this is something you'll execute periodically, you'll need a bit more code to extract the internal data sheets (e.g. the "planhilas" at the bottom), then extracting the actual data. Let me know if you want this as a one-time thing, or as a reusable program. If you don't know how to use RCurl and XML to parse HTML I'll be happy to help with that too. I'd just like to know more about the scope of your question. Cheers, pr3d -- pr3d4t0r at #R, ##java, #awk, #pyton irc.freeenode.net -- pr3d4t0r at #R, ##java, #awk, #pyton irc.freeenode.net Links: ------ [1] http://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1202.htm [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.