Hi, I really appreciate your help. I definitively need a reusable program since I have been asking to someone to extract these data from the Internet everyday. That's the reason why I am trying to do a program to do that Related to the url I sent, I have just realized that although I had written the one related to only worksheet (PLANILHA2) when I copy it to my browse it is showed the link with both worksheets.
I am going to read about Rcurl and XML libraries but I hope you can help me too. Thanks in advance Nilza Barros On Sun, Feb 12, 2012 at 10:42 AM, CIURANA EUGENE (R) <r.u...@ciurana.eu>wrote: > ** > > On Sat, 11 Feb 2012 22:49:07 -0200, Nilza BARROS wrote: > > I have to read data from a worksheet that is available on the Internet. I > have been doing this by copying the worksheet from the browser. > But I would like to be able to copy the data automatically using the url > command. > > But when using "url" command the result is the source code, I mean, a html > code. > I see that the data I need is in the source code but before thinking about > reading the data from the html code I wonder if there is a package or > anoher way to extract these data since reading from the code will demand > many work and it can be not so accurate. > > Below one can see the from where I am trying to export the data: > > dadoshttp://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1201_arquivos/sheet002.htm","r > ") > > > > Hi Nilza, > > The URL that you posted points at a document that has another document > within it, in a frame. These files are Excel dumps into HTML. To view the > actual data you need the URIs for each data set. Those appear at the > bottom of the listing, under sc1201_arquivos/sheet001.htm and sheet002.htm. > Your code must fetch these files, not the one at > http://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1202.htm which > only "wraps" them. Most of what you see on the file that you linked isn't > HTML - it's JavaScript and style information for the data living on the two > separate HTML documents. > > You can do this in R using the RCurl and XML libraries, by pulling the > specific files for each data source. If this is a one-time thing, I'd > suggest just coding something simple that loads the data for each file. If > this is something you'll execute periodically, you'll need a bit more code > to extract the internal data sheets (e.g. the "planhilas" at the bottom), > then extracting the actual data. > > Let me know if you want this as a one-time thing, or as a reusable > program. If you don't know how to use RCurl and XML to parse HTML I'll be > happy to help with that too. I'd just like to know more about the scope of > your question. > > Cheers, > > pr3d > > -- > pr3d4t0r at #R, ##java, #awk, #pytonirc.freeenode.net > > -- Abraço, Nilza Barros [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.