This looks as though you need to be a little XML old-school. readHTMLTable is a summary function drawing on:
?htmlTreeParse() turns the table into xml ?xpathApply() and more. #xpathApply(doc, , "//td", function(x)xmlValue(x)) breaks each line at the end of a table cell and extracts the value # The "//th" picks out the table headings without distinction as to whether they are rows or columns Followed by various gsub() and turning it into a matrix (as this comes out with a list of values without columns. I couldn't identify the headings, but the table body is definitely doable. readHTMLTable seems to assume that the column headings are a single row, which isn't always the case. Paul Bivand On 5 November 2013 18:44, Barry Rowlingson <b.rowling...@lancaster.ac.uk> wrote: > On 4 Nov 2013 19:30, "David Winsemius" <dwinsem...@comcast.net> wrote: > >> Maybe you should use their "download" facility rather than trying to > deparse a complex webpage with lots of special user interaction "features": >> >> http://appsso.eurostat.ec.europa.eu/nui/setupDownloads.do >> > > That web page depends on the user already having been to the previous page > to set up a session and so directly downloading a dataset requires setting > up cookies and making sure the request has all the right parameters. Looks > like a right pain. > > -- >> David. >> > >> >> On Nov 4, 2013, at 11:03 AM, Lorenzo Isella wrote: >> >> > Thanks. >> > I had already introduced this minor adjustments in the code, but the > real problem (to me) is the information that gets lost: the informative > name of the columns, the indicator type and the units. >> >> > Cheers >> > >> > Lorenzo >> > >> > On Mon, 04 Nov 2013 19:52:51 +0100, Rui Barradas <ruipbarra...@sapo.pt> > wrote: >> > >> >> Hello, >> >> >> >> If you want to get rid of the (bp) stuff, you can use lapply/gsub. > Using Jean's code a bit changed, >> >> >> >> library(XML) >> >> >> >> mylines <- readLines(url("http://bit.ly/1coCohq")) >> >> closeAllConnections() >> >> mytable <- readHTMLTable(mylines, which = 2, asText=TRUE, > stringsAsFactors = FALSE) >> >> >> >> str(mytable) >> >> >> >> mytable[] <- lapply(mytable, function(x) gsub("\\(.*\\)", "", x)) >> >> mytable[] <- lapply(mytable, function(x) gsub(",", "", x)) >> >> mytable[] <- lapply(mytable, as.numeric) >> >> >> >> colnames(mytable) <- 2000:2013 >> >> >> >> >> >> Hope this helps, >> >> >> >> Rui Barradas >> >> >> >> Em 04-11-2013 09:53, Lorenzo Isella escreveu: >> >>> Hello, >> >>> And thanks a lot. >> >>> This is indeed very close to what I need. >> >>> I am trying to figure out how not to "lose" the headers and how to > avoid >> >>> downloading labels like "(p)" together with the numerical data I am >> >>> interested in. >> >>> If anyone on the list knows how to make this minor modifications, s/he >> >>> will make my life much easier. >> >>> Cheers >> >>> >> >>> Lorenzo >> >>> >> >>> >> >>> On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean <jvad...@usgs.gov> > wrote: >> >>> >> >>>> Lorenzo, >> >>>> >> >>>> I may be able to help you get started. You can use the XML package > to >> >>>> grab the information >off the internet. >> >>>> >> >>>> library(XML) >> >>>> >> >>>> mylines <- readLines(url("http://bit.ly/1coCohq")) >> >>>> closeAllConnections()mylist <- readHTMLTable(mylines, >> >>>> asText=TRUE)mytable <- mylist1$xTable >> >>>> >> >>>> However, when I look at the resulting object, mytable, it doesn't > have >> >>>> informative row or >column headings. Perhaps someone else can figure >> >>>> out how to get that information. >> >>>> >> >>>> Jean >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella >> >>>> <lorenzo.ise...@gmail.com> wrote: >> >>>>> Dear All, >> >>>>> I often need to do some work on some data which is publicly > available >> >>>>> on the EUROSTAT >>website. >> >>>>> I saw several ways to download automatically mainly the bulk data >> >>>>> from EUROSTAT to later on >>postprocess it with R, for instance >> >>>>> >> >>>>> http://bit.ly/HrDICj >> >>>>> http://bit.ly/HrDL10 >> >>>>> http://bit.ly/HrDTgT >> >>>>> >> >>>>> However, what I would like to do is to be able to download directly >> >>>>> the csv file >>corresponding to a properly formatted dataset >> >>>>> (typically a dynamic dataset) from EUROSTAT. >> >>>>> To fix the ideas, please consider the dataset at the following link >> >>>>> >> >>>>> http://bit.ly/1coCohq >> >>>>> >> >>>>> what I would like to do is to automatically read its content into R, >> >>>>> or at least to >>automatically download it as a csv file (full >> >>>>> extraction, single file, no flags and >>footnotes) which I can then >> >>>>> manipulate easily. >> >>>>> Any suggestion is appreciated. >> >>>>> Cheers >> >>>>> >> >>>>> Lorenzo >> >>>>> >> >>>>> ______________________________________________ >> >>>>> R-help@r-project.org mailing list >> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>>>> PLEASE do read the posting guide >> >>>>> http://www.R-project.org/posting-guide.html >> >>>>> and provide commented, minimal, self-contained, reproducible code. >> >>> ______________________________________________ >> >>> R-help@r-project.org mailing list >> >>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>> PLEASE do read the posting guide >> >>> http://www.R-project.org/posting-guide.html >> >>> and provide commented, minimal, self-contained, reproducible code. >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius >> Alameda, CA, USA >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.