They seem to release their data in xml and csv formats also... why are you scraping? -- Sent from my phone. Please excuse my brevity.
On January 23, 2018 9:31:01 AM PST, Ilio Fornasero <iliofornas...@hotmail.com> wrote: >I am doing a research on World Bank (WB) projects on developing >countries. To do so, I am scraping their website in order to collect >the data I am interested in. > >The structure of the webpage I want to scrape is the following: > >1. List of countries the list of all countries in which WB has >developed projects<http://projects.worldbank.org/country?lang=en&page=> > >1.1. By clicking on a single country on 1. , one gets the single >countries project list (that includes many webpages) it includes all >the projects in a single countries ><http://projects.worldbank.org/search?lang=en&searchTerm=&countrycode_exact=3A> >. Of course, here I have included just one page of a single countries, >but every country has a number of pages dedicated to this subject > >1.1.1. By clicking on a a single project on 1.1. , one gets - among the >others - the project's overview >option<http://projects.worldbank.org/P155642/?lang=en&tab=overview> I >am interested in. > >In other words, my problem is to find out a way to create a dataframe >including all the countries, a complete list of all projects for each >country and an overview of any single project. > > >Yet, this is the code that I have (unsuccessfully) written: > >WB_links <- >"http://projects.worldbank.org/country?lang=en&page=projects" > > WB_proj <- function(x) { > > Sys.sleep(5) >url <- >sprintf("http://projects.worldbank.org/search?lang=en&searchTerm=&countrycode_exact=%s", >x) > > html <- read_html(url) > >tibble(title = html_nodes(html, ".grid_20") %>% html_text(trim = TRUE), > project_url = html_nodes(html, ".grid_20") %>% html_attr("href")) > } > > WB_scrape <- map_df(1:5, WB_proj) %>% > mutate(study_description = > map(project_url, > ~read_html(sprintf >("http://projects.worldbank.org/search?lang=en&searchTerm=&countrycode_exact=%s", >.x)) %>% > html_node() %>% > html_text())) > > >Any suggestion? > >Note: I am sorry if this question seems trivial, but I am quite a >newbie in R and I haven't found a help on this by looking around >(though I could have missed something, of course). > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.