On Nov 4, 2013, at 11:03 AM, Lorenzo Isella wrote: > Thanks. > I had already introduced this minor adjustments in the code, but the real > problem (to me) is the information that gets lost: the informative name of > the columns, the indicator type and the units.
Maybe you should use their "download" facility rather than trying to deparse a complex webpage with lots of special user interaction "features": http://appsso.eurostat.ec.europa.eu/nui/setupDownloads.do -- David. > Cheers > > Lorenzo > > On Mon, 04 Nov 2013 19:52:51 +0100, Rui Barradas <ruipbarra...@sapo.pt> wrote: > >> Hello, >> >> If you want to get rid of the (bp) stuff, you can use lapply/gsub. Using >> Jean's code a bit changed, >> >> library(XML) >> >> mylines <- readLines(url("http://bit.ly/1coCohq")) >> closeAllConnections() >> mytable <- readHTMLTable(mylines, which = 2, asText=TRUE, stringsAsFactors = >> FALSE) >> >> str(mytable) >> >> mytable[] <- lapply(mytable, function(x) gsub("\\(.*\\)", "", x)) >> mytable[] <- lapply(mytable, function(x) gsub(",", "", x)) >> mytable[] <- lapply(mytable, as.numeric) >> >> colnames(mytable) <- 2000:2013 >> >> >> Hope this helps, >> >> Rui Barradas >> >> Em 04-11-2013 09:53, Lorenzo Isella escreveu: >>> Hello, >>> And thanks a lot. >>> This is indeed very close to what I need. >>> I am trying to figure out how not to "lose" the headers and how to avoid >>> downloading labels like "(p)" together with the numerical data I am >>> interested in. >>> If anyone on the list knows how to make this minor modifications, s/he >>> will make my life much easier. >>> Cheers >>> >>> Lorenzo >>> >>> >>> On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean <jvad...@usgs.gov> wrote: >>> >>>> Lorenzo, >>>> >>>> I may be able to help you get started. You can use the XML package to >>>> grab the information >off the internet. >>>> >>>> library(XML) >>>> >>>> mylines <- readLines(url("http://bit.ly/1coCohq")) >>>> closeAllConnections()mylist <- readHTMLTable(mylines, >>>> asText=TRUE)mytable <- mylist1$xTable >>>> >>>> However, when I look at the resulting object, mytable, it doesn't have >>>> informative row or >column headings. Perhaps someone else can figure >>>> out how to get that information. >>>> >>>> Jean >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella >>>> <lorenzo.ise...@gmail.com> wrote: >>>>> Dear All, >>>>> I often need to do some work on some data which is publicly available >>>>> on the EUROSTAT >>website. >>>>> I saw several ways to download automatically mainly the bulk data >>>>> from EUROSTAT to later on >>postprocess it with R, for instance >>>>> >>>>> http://bit.ly/HrDICj >>>>> http://bit.ly/HrDL10 >>>>> http://bit.ly/HrDTgT >>>>> >>>>> However, what I would like to do is to be able to download directly >>>>> the csv file >>corresponding to a properly formatted dataset >>>>> (typically a dynamic dataset) from EUROSTAT. >>>>> To fix the ideas, please consider the dataset at the following link >>>>> >>>>> http://bit.ly/1coCohq >>>>> >>>>> what I would like to do is to automatically read its content into R, >>>>> or at least to >>automatically download it as a csv file (full >>>>> extraction, single file, no flags and >>footnotes) which I can then >>>>> manipulate easily. >>>>> Any suggestion is appreciated. >>>>> Cheers >>>>> >>>>> Lorenzo >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.