Try the readHTMLTable function in package XML: sheet2 <- readHTMLTable(" http://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1201_arquivos/sheet002.htm", skip.rows = 2)
head(sheet2[[1]]) On Sun, Feb 12, 2012 at 4:24 PM, Nilza BARROS <nilzabar...@gmail.com> wrote: > Hi, > > I really appreciate your help. I definitively need a reusable program since > I have been asking to someone to extract these data from the Internet > everyday. That's the reason why I am trying to do a program to do that > Related to the url I sent, I have just realized that although I had written > the one related to only worksheet (PLANILHA2) when I copy it to my browse > it is showed the link with both worksheets. > > > I am going to read about Rcurl and XML libraries but I hope you can help me > too. > > Thanks in advance > Nilza Barros > > > On Sun, Feb 12, 2012 at 10:42 AM, CIURANA EUGENE (R) <r.u...@ciurana.eu > >wrote: > > > ** > > > > On Sat, 11 Feb 2012 22:49:07 -0200, Nilza BARROS wrote: > > > > I have to read data from a worksheet that is available on the Internet. I > > have been doing this by copying the worksheet from the browser. > > But I would like to be able to copy the data automatically using the url > > command. > > > > But when using "url" command the result is the source code, I mean, a > html > > code. > > I see that the data I need is in the source code but before thinking > about > > reading the data from the html code I wonder if there is a package or > > anoher way to extract these data since reading from the code will demand > > many work and it can be not so accurate. > > > > Below one can see the from where I am trying to export the data: > > > > dadoshttp:// > www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1201_arquivos/sheet002.htm > ","r > > ") > > > > > > > > Hi Nilza, > > > > The URL that you posted points at a document that has another document > > within it, in a frame. These files are Excel dumps into HTML. To view > the > > actual data you need the URIs for each data set. Those appear at the > > bottom of the listing, under sc1201_arquivos/sheet001.htm and > sheet002.htm. > > Your code must fetch these files, not the one at > > http://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1202.htm which > > only "wraps" them. Most of what you see on the file that you linked > isn't > > HTML - it's JavaScript and style information for the data living on the > two > > separate HTML documents. > > > > You can do this in R using the RCurl and XML libraries, by pulling the > > specific files for each data source. If this is a one-time thing, I'd > > suggest just coding something simple that loads the data for each file. > If > > this is something you'll execute periodically, you'll need a bit more > code > > to extract the internal data sheets (e.g. the "planhilas" at the bottom), > > then extracting the actual data. > > > > Let me know if you want this as a one-time thing, or as a reusable > > program. If you don't know how to use RCurl and XML to parse HTML I'll > be > > happy to help with that too. I'd just like to know more about the scope > of > > your question. > > > > Cheers, > > > > pr3d > > > > -- > > pr3d4t0r at #R, ##java, #awk, #pytonirc.freeenode.net > > > > > > > -- > Abraço, > Nilza Barros > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.