Thanks to Jim's prompting, I think I came up with a fairly painless way to parse the HTML without having to write any parsing code myself using the function getHTMLExternalFiles in the XML package. A working version of the code follows:
## Code to process USGS peak flow data require(dataRetrieval) require(XML) ## Need to start with list of gauge ids to process siteno <- c('12142000','12134500','12149000') lstas <-length(siteno) #length of locator list print(paste('Processsing...',siteno[1],' ',siteno[1], sep = "")) datall <- readNWISpeak(siteno[1]) for (a in 2:lstas) { # Print station being processed print(paste('Processsing...',siteno[a], sep = "")) dat<- readNWISpeak(siteno[a]) datall <- rbind(datall,dat) } write.csv(datall, file = "usgs_peaks.csv") # Retrieve ascii text files and graphics for (a in 1:lstas) { print(paste('Processsing...',siteno[a], sep = "")) graphic.url <- paste('http://nwis.waterdata.usgs.gov/nwis/peak?site_no=',siteno[a],'&agency_cd=USGS&format=img', sep = "") usgs.img <- getHTMLExternalFiles(graphic.url) graphic.img <- paste('http://nwis.waterdata.usgs.gov',usgs.img, sep = "") peakfq.url <- paste('http://nwis.waterdata.usgs.gov/nwis/peak?site_no=',siteno[a],'&agency_cd=USGS&format=hn2', sep = "") tab.url <- paste('http://nwis.waterdata.usgs.gov/nwis/peak?site_no=',siteno[a],'&agency_cd=USGS&format=rdb', sep = "") graphic.fn <- paste('graphic_',siteno[a],'.gif', sep = "") peakfq.fn <- paste('peakfq_',siteno[a],'.txt', sep = "") tab.fn <- paste('tab_',siteno[a],'.txt', sep = "") download.file(graphic.img,graphic.fn,mode='wb') download.file(peakfq.url,peakfq.fn) download.file(tab.url,tab.fn) } > ------------------------------ > > Message: 34 > Date: Fri, 5 Jun 2015 08:59:04 +1000 > From: Jim Lemon <drjimle...@gmail.com> > To: Curtis DeGasperi <curtis.degasp...@gmail.com> > Cc: r-help mailing list <r-help@r-project.org> > Subject: Re: [R] web scraping image > Message-ID: > < ca+8x3fv0ajw+e22jayv1gfm6jr_tazua5fwgd3t_mfgfqy2...@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > Hi Chris, > I don't have the packages you are using, but tracing this indicates > that the page source contains the relative path of the graphic, in > this case: > > /nwisweb/data/img/USGS.12144500.19581112.20140309..0.peak.pres.gif > > and you already have the server URL: > > nwis.waterdata.usgs.gov > > getting the path out of the page source isn't difficult, just split > the text at double quotes and get the token following "img src=". If I > understand the arguments of "download.file" correctly, the path is the > graphic.fn argument and the server URL is the graphic.url argument. I > would paste them together and display the result to make sure that it > matches the image you want. When I did this, the correct image > appeared in my browser. I'm using Google Chrome, so I don't have to > prepend the http:// > > Jim > > On Fri, Jun 5, 2015 at 2:31 AM, Curtis DeGasperi > <curtis.degasp...@gmail.com> wrote: >> I'm working on a script that downloads data from the USGS NWIS server. >> dataRetrieval makes it easy to quickly get the data in a neat tabular >> format, but I was also interested in getting the tabular text files - >> also fairly easy for me using download.file. >> >> However, I'm not skilled enough to work out how to download the nice >> graphic files that can be produced dynamically from the USGS NWIS >> server (for example: >> http://nwis.waterdata.usgs.gov/nwis/peak?site_no=12144500&agency_cd=USGS&format=img ) >> >> My question is how do I get the image from this web page and save it >> to a local directory? scrapeR returns the information from the page >> and I suspect this is a possible solution path, but I don't know what >> the next step is. >> >> My code provided below works from a list I've created of USGS flow >> gauging stations. >> >> Curtis >> >> ## Code to process USGS daily flow data for high and low flow analysis >> ## Need to start with list of gauge ids to process >> ## Can't figure out how to automate download of images >> >> require(dataRetrieval) >> require(data.table) >> require(scrapeR) >> >> df <- read.csv("usgs_stations.csv", header=TRUE) >> >> lstas <-length(df$siteno) #length of locator list >> >> print(paste('Processsing...',df$name[1],' ',df$siteno[1], sep = "")) >> >> datall <- readNWISpeak(df$siteno[1]) >> >> for (a in 2:lstas) { >> # Print station being processed >> print(paste('Processsing...',df$name[a],' ',df$siteno[a], sep = "")) >> >> dat<- readNWISpeak(df$siteno[a]) >> >> datall <- rbind(datall,dat) >> >> } >> >> write.csv(datall, file = "usgs_peaks.csv") >> >> # Retrieve ascii text files and graphics >> >> for (a in 1:lstas) { >> >> print(paste('Processsing...',df$name[1],' ',df$siteno[1], sep = "")) >> >> graphic.url <- >> paste('http://nwis.waterdata.usgs.gov/nwis/peak?site_no= ',df$siteno[a],'&agency_cd=USGS&format=img', >> sep = "") >> peakfq.url <- >> paste('http://nwis.waterdata.usgs.gov/nwis/peak?site_no= ',df$siteno[a],'&agency_cd=USGS&format=hn2', >> sep = "") >> tab.url <- paste('http://nwis.waterdata.usgs.gov/nwis/peak?site_no= ',df$siteno[a],'&agency_cd=USGS&format=rdb', >> sep = "") >> >> graphic.fn <- paste('graphic_',df$siteno[a],'.gif', sep = "") >> peakfq.fn <- paste('peakfq_',df$siteno[a],'.txt', sep = "") >> tab.fn <- paste('tab_',df$siteno[a],'.txt', sep = "") >> >> download.file(graphic.url,graphic.fn,mode='wb') # This apparently >> doesn't work - file is empty >> download.file(peakfq.url,peakfq.fn) >> download.file(tab.url,tab.fn) >> } >> >> # scrapeR >> pageSource<-scrape(url=" http://nwis.waterdata.usgs.gov/nwis/peak?site_no=12144500&agency_cd=USGS&format=img ",headers=TRUE, >> parse=FALSE) >> page<-scrape(object="pageSource") >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.