On 4 Nov 2013 19:30, "David Winsemius" <dwinsem...@comcast.net> wrote:
> Maybe you should use their "download" facility rather than trying to deparse a complex webpage with lots of special user interaction "features": > > http://appsso.eurostat.ec.europa.eu/nui/setupDownloads.do > That web page depends on the user already having been to the previous page to set up a session and so directly downloading a dataset requires setting up cookies and making sure the request has all the right parameters. Looks like a right pain. -- > David. > > > > On Nov 4, 2013, at 11:03 AM, Lorenzo Isella wrote: > > > Thanks. > > I had already introduced this minor adjustments in the code, but the real problem (to me) is the information that gets lost: the informative name of the columns, the indicator type and the units. > > > Cheers > > > > Lorenzo > > > > On Mon, 04 Nov 2013 19:52:51 +0100, Rui Barradas <ruipbarra...@sapo.pt> wrote: > > > >> Hello, > >> > >> If you want to get rid of the (bp) stuff, you can use lapply/gsub. Using Jean's code a bit changed, > >> > >> library(XML) > >> > >> mylines <- readLines(url("http://bit.ly/1coCohq")) > >> closeAllConnections() > >> mytable <- readHTMLTable(mylines, which = 2, asText=TRUE, stringsAsFactors = FALSE) > >> > >> str(mytable) > >> > >> mytable[] <- lapply(mytable, function(x) gsub("\\(.*\\)", "", x)) > >> mytable[] <- lapply(mytable, function(x) gsub(",", "", x)) > >> mytable[] <- lapply(mytable, as.numeric) > >> > >> colnames(mytable) <- 2000:2013 > >> > >> > >> Hope this helps, > >> > >> Rui Barradas > >> > >> Em 04-11-2013 09:53, Lorenzo Isella escreveu: > >>> Hello, > >>> And thanks a lot. > >>> This is indeed very close to what I need. > >>> I am trying to figure out how not to "lose" the headers and how to avoid > >>> downloading labels like "(p)" together with the numerical data I am > >>> interested in. > >>> If anyone on the list knows how to make this minor modifications, s/he > >>> will make my life much easier. > >>> Cheers > >>> > >>> Lorenzo > >>> > >>> > >>> On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean <jvad...@usgs.gov> wrote: > >>> > >>>> Lorenzo, > >>>> > >>>> I may be able to help you get started. You can use the XML package to > >>>> grab the information >off the internet. > >>>> > >>>> library(XML) > >>>> > >>>> mylines <- readLines(url("http://bit.ly/1coCohq")) > >>>> closeAllConnections()mylist <- readHTMLTable(mylines, > >>>> asText=TRUE)mytable <- mylist1$xTable > >>>> > >>>> However, when I look at the resulting object, mytable, it doesn't have > >>>> informative row or >column headings. Perhaps someone else can figure > >>>> out how to get that information. > >>>> > >>>> Jean > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella > >>>> <lorenzo.ise...@gmail.com> wrote: > >>>>> Dear All, > >>>>> I often need to do some work on some data which is publicly available > >>>>> on the EUROSTAT >>website. > >>>>> I saw several ways to download automatically mainly the bulk data > >>>>> from EUROSTAT to later on >>postprocess it with R, for instance > >>>>> > >>>>> http://bit.ly/HrDICj > >>>>> http://bit.ly/HrDL10 > >>>>> http://bit.ly/HrDTgT > >>>>> > >>>>> However, what I would like to do is to be able to download directly > >>>>> the csv file >>corresponding to a properly formatted dataset > >>>>> (typically a dynamic dataset) from EUROSTAT. > >>>>> To fix the ideas, please consider the dataset at the following link > >>>>> > >>>>> http://bit.ly/1coCohq > >>>>> > >>>>> what I would like to do is to automatically read its content into R, > >>>>> or at least to >>automatically download it as a csv file (full > >>>>> extraction, single file, no flags and >>footnotes) which I can then > >>>>> manipulate easily. > >>>>> Any suggestion is appreciated. > >>>>> Cheers > >>>>> > >>>>> Lorenzo > >>>>> > >>>>> ______________________________________________ > >>>>> R-help@r-project.org mailing list > >>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>> PLEASE do read the posting guide > >>>>> http://www.R-project.org/posting-guide.html > >>>>> and provide commented, minimal, self-contained, reproducible code. > >>> ______________________________________________ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.