I have done the following using readLines directory <- "~/" files <- list.files(directory) data_frames <- vector("list", length(files)) for (i in seq_along(files)) { df <- readLines(file.path(directory, files[i])) df <- df[-(1:13)] df <- data.frame(year = substr(df,1,4), month = substr(df, 6,7), day = substr(df, 9, 10), hour = substr(df, 12, 13), temp = substr(df, 21, 27)) data_frames[[i]] <- df }
What I have been have been having trouble is adding the following information from the cities file (100 cities) for each of the downloaded data files. I would like to do the following but automatically: ### mydata$city <- rep(cities[1,1], nrow(mydata)) mydata$state <- rep(cities[1,2], nrow(mydata)) mydata$lon <- rep(cities[1,3], nrow(mydata)) mydata$lat <- rep(cities[1,4], nrow(mydata)) ### The information for cities look like this: ### cities <- dput(droplevels(head(cities, 5))) structure(list(city = structure(1:5, .Label = c("Boston", "Bridgeport", "Cambridge", "Fall River", "Hartford"), class = "factor"), state = structure(c(2L, 1L, 2L, 2L, 1L), .Label = c(" CT ", " MA "), class = "factor"), lon = c(-71.06, -73.19, -71.11, -71.16, -72.67), lat = c(42.36, 41.18, 42.37, 41.7, 41.77)), .Names = c("city", "state", "lon", "lat"), row.names = c(NA, 5L), class = "data.frame") ### Apologies if this seems trivial but I have been having a hard time. Thank you again. Sincerely, Milu On Mon, Oct 16, 2017 at 7:13 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > > On Oct 15, 2017, at 3:35 PM, Miluji Sb <miluj...@gmail.com> wrote: > > > > Dear David, > > > > This is amazing, thank you so much. If I may ask another question: > > > > The output looks like the following: > > > > ### > > dput(head(x,15)) > > c("Metadata for Requested Time Series:", "", "prod_name=GLDAS_NOAH025_3H_ > v2.0", > > "param_short_name=Tair_f_inst", "param_name=Near surface air > temperature", > > "unit=K", "begin_time=1970-01-01T00", "end_time=1979-12-31T21", > > "lat= 42.36", "lon=-71.06", "Request_time=2017-10-15 22:20:03 GMT", > > "", "Date&Time Data", "1970-01-01T00:00:00\t267.769", > > "1970-01-01T03:00:00\t264.595") > > ### > > > > Thus I need to drop the first 13 rows and do the following to add > identifying information: > > Are you having difficulty reading in the data from disk? The `read.table` > function has a "skip" parameter. > > > > ### > > mydata <- data.frame(year = substr(x,1,4), > > That would not appear to do anything useful with x. The `x` object is not > a long string. The items you want are in separate elements of x. > > substr(x,1,4) # now returns > [1] "Meta" "" "prod" "para" "para" "unit" "begi" "end_" "lat=" "lon=" > "Requ" "" "Date" > [14] "1970" "1970" > > You need to learn basic R indexing. The year might be extracted from the > 7th element of x x via code like this: > > year <- substr( x[7], 1,4) > > > month = substr(x, 6,7), > > day = substr(x, 9, 10), > > hour = substr(x, 12, 13), > > temp = substr(x, 21, 27)) > > The time and temp items would naturally be read in with read.table (or in > the case of tab-delimited data with read.delim) after skipping the first 14 > lines. > > > > > > mydata$city <- rep(cities[1,1], nrow(mydata)) > > There's no need to use `rep` with data.frame. If one argument to > data.frame is length n then all single elelment arguments will be > "recycled" to fill in the needed number of rows. Please take the time to > work through all the pages of "Introduction to R" (shipped with all > distributions of R) or pick another introductory text. We cannot provide > tutoring to all students. You need to put in the needed self-study first. > > -- > David. > > > > mydata$state <- rep(cities[1,2], nrow(mydata)) > > mydata$lon <- rep(cities[1,3], nrow(mydata)) > > mydata$lat <- rep(cities[1,4], nrow(mydata)) > > ### > > > > Is it possible to incorporate these into your code so the data looks > like this: > > > > dput(droplevels(head(mydata))) > > structure(list(year = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = > "1970", class = "factor"), > > month = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "01", class = > "factor"), > > day = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "01", class = > "factor"), > > hour = structure(1:6, .Label = c("00", "03", "06", "09", > > "12", "15"), class = "factor"), temp = structure(c(6L, 4L, > > 2L, 1L, 3L, 5L), .Label = c("261.559", "262.525", "262.648", > > "264.595", "265.812", "267.769"), class = "factor"), city = > structure(c(1L, > > 1L, 1L, 1L, 1L, 1L), .Label = "Boston", class = "factor"), > > state = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = " MA ", class > = "factor"), > > lon = c(-71.06, -71.06, -71.06, -71.06, -71.06, -71.06), > > lat = c(42.36, 42.36, 42.36, 42.36, 42.36, 42.36)), .Names = > c("year", > > "month", "day", "hour", "temp", "city", "state", "lon", "lat" > > ), row.names = c(NA, 6L), class = "data.frame") > > > > Apologies for asking repeated questions and thank you again! > > Of course it's possible. I don't understand where the difficulty lies. > > > > Sincerely, > > > > Milu > > > > On Sun, Oct 15, 2017 at 11:45 PM, David Winsemius < > dwinsem...@comcast.net> wrote: > > > > > On Oct 15, 2017, at 2:02 PM, Miluji Sb <miluj...@gmail.com> wrote: > > > > > > Dear all, > > > > > > i am trying to download time-series climatic data from GES DISC (NASA) > > > Hydrology Data Rods web-service. Unfortunately, no wget method is > > > available. > > > > > > Five parameters are needed for data retrieval: variable, location, > > > startDate, endDate, and type. For example: > > > > > > ### > > > https://hydro1.gesdisc.eosdis.nasa.gov/daac-bin/access/ > timeseries.cgi?variable=GLDAS2:GLDAS_NOAH025_3H_v2.0: > Tair_f_inst&startDate=1970-01-01T00&endDate=1979-12-31T00& > location=GEOM:POINT(-71.06,%2042.36)&type=asc2 > > > ### > > > > > > In this case, variable: Tair_f_inst (temperature), location: (-71.06, > > > 42.36), startDate: 01 January 1970; endDate: 31 December 1979; type: > asc2 > > > (output 2-column ASCII). > > > > > > I am trying to download data for 100 US cities, data for which I have > in > > > the following data.frame: > > > > > > ### > > > cities <- dput(droplevels(head(cities, 5))) > > > structure(list(city = structure(1:5, .Label = c("Boston", "Bridgeport", > > > "Cambridge", "Fall River", "Hartford"), class = "factor"), state = > > > structure(c(2L, > > > 1L, 2L, 2L, 1L), .Label = c(" CT ", " MA "), class = "factor"), > > > lon = c(-71.06, -73.19, -71.11, -71.16, -72.67), lat = c(42.36, > > > 41.18, 42.37, 41.7, 41.77)), .Names = c("city", "state", > > > "lon", "lat"), row.names = c(NA, 5L), class = "data.frame") > > > ### > > > > > > Is it possible to download the data for the multiple locations > > > automatically (e.g. RCurl) and save them as csv? Essentially, reading > > > coordinates from the data.frame and entering it in the URL. > > > > > > I would also like to add identifying information to each of the data > files > > > from the cities data.frame. I have been doing the following for a > single > > > file: > > > > Didn't seem that difficult: > > > > library(downloader) # makes things easier for Macs, perhaps not needed > > # if not used will need to use download.file > > > > for( i in 1:5) { > > target1 <- paste0("https://hydro1.gesdisc.eosdis.nasa.gov/daac- > bin/access/timeseries.cgi?variable=GLDAS2:GLDAS_NOAH025_ > 3H_v2.0:Tair_f_inst&startDate=1970-01-01T00&endDate=1979-12- > 31T00&location=GEOM:POINT(", > > cities[i, "lon"], > > ",%20", cities[i,"lat"], > > ")&type=asc2") > > target2 <- paste0("~/", # change for whatever destination directory > you may prefer. > > cities[i,"city"], > > cities[i,"state"], ".asc") > > download(url=target1, destfile=target2) > > } > > > > Now I have 5 named files with extensions ".asc" in my user directory > (since I'm on a Mac). It is a slow website so patience is needed. > > > > -- > > David > > > > > > > > > > ### > > > x <- readLines(con=url(" > > > https://hydro1.gesdisc.eosdis.nasa.gov/daac-bin/access/ > timeseries.cgi?variable=GLDAS2:GLDAS_NOAH025_3H_v2.0: > Tair_f_inst&startDate=1970-01-01T00&endDate=1979-12-31T00& > location=GEOM:POINT(-71.06,%2042.36)&type=asc2 > > > ")) > > > x <- x[-(1:13)] > > > > > > mydata <- data.frame(year = substr(x,1,4), > > > month = substr(x, 6,7), > > > day = substr(x, 9, 10), > > > hour = substr(x, 12, 13), > > > temp = substr(x, 21, 27)) > > > > > > mydata$city <- rep(cities[1,1], nrow(mydata)) > > > mydata$state <- rep(cities[1,2], nrow(mydata)) > > > mydata$lon <- rep(cities[1,3], nrow(mydata)) > > > mydata$lat <- rep(cities[1,4], nrow(mydata)) > > > ### > > > > > > Help and advice would be greatly appreciated. Thank you! > > > > > > Sincerely, > > > > > > Milu > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > David Winsemius > > Alameda, CA, USA > > > > 'Any technology distinguishable from magic is insufficiently advanced.' > -Gehm's Corollary to Clarke's Third Law > > > > > > > > > > > > > > David Winsemius > Alameda, CA, USA > > 'Any technology distinguishable from magic is insufficiently advanced.' > -Gehm's Corollary to Clarke's Third Law > > > > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.