First off, thanks SO much Eric and Micha for your help on this problem! I think Micha's spatially-oriented solution with Eric's slight modifications will work best for my application but there is one snag (see the commented section near the end of the following code)-- basically I don't know how to apply the lapply operator to a list with a variable length (namely the length of my input csv files) rather than the fixed length that Eric used:
library(sf) library(spdep) data<- read.csv("R_find_pts_testdata.csv") MAX_DIST <- 0.05 #50 m in km, the units of dnearneigh when coords are in degrees pts <- st_as_sf(data, coords=c('LON', 'LAT'), crs=4326) dist_matrix <- dnearneigh(pts, 0, MAX_DIST, use_s2=TRUE) #Micha's function to get the minimum Conc value among all points #within the buffer distance to a given single point: # Function to get minimum Conc values for one row of distance matrix MinConc <- function(x, lst, pts) { Concs <- lapply(lst, function(p) { pts$Conc[p] }) return(min(Concs[[1]])) } # above, x is an index to a single point, lst is a list of point indices #from distance matrix within the buffer distance #Next run function on all points to get list of minimum Conc #values for all points, and merge back to pts. #...modified by Eric to include original point return(min(c(Concs[[1]], pts$Conc[x]))) # Now apply this function to all points in pts ###This is where the problem is, I think: ###Eric had used N <- 1000L and later ###Conc_min <- lapply(1:N, function(i) { ### MinConc(i, dist_matrix[i], pts)}) ###But I need the length of the list the function is applied to ###to be variable, depending on the length of the input csv file ###unlike the dummy variable dataframe that Eric used with a set length ###So I changed the "x" argument in lapply to "pts$X" but that generates an empty list Conc_min <- lapply(pts$X, function(i){ MinConc(i, dist_matrix[i], pts) #}) Conc_min <- data.frame("Conc_min" = as.integer(Conc_min)) # Add back as new attrib to original points sf object pts_with_min <- do.call(cbind, c(pts, Conc_min)) Many thanks again for your help on this! Best regards, -Tiffany ________________________________ From: Micha Silver <tsvi...@gmail.com> Sent: Monday, November 7, 2022 8:11 AM To: Duhl, Tiffany R. <tiffany.d...@tufts.edu>; Eric Berger <ericjber...@gmail.com> Cc: R-help <R-help@r-project.org> Subject: Re: [R] [External] Re: Selecting a minimum value of an attribute associated with point values neighboring a given point and assigning it as a new attribute Eric's solution notwithstanding, here's a more "spatial" approach. I first create a fictitious set of 1000 points (and save to CSV to replicate your workflow) library(sf) library(spdep) # Prepare fictitious data # Create a data.frame with 1000 random points, and save to CSV LON <- runif(1000, -70.0, -69.0) LAT <- runif(1000, 42.0, 43.0) Conc <- runif(1000, 90000, 100000) df <- data.frame(LON, LAT, Conc) csv_file = "/tmp/pts_testdata.csv" write.csv(df, csv_file) Now read that CSV back in directly as an sf object (No need for the old SpatialPointsDataFrame). THen create a distance matrix for all points, which contains indicies to those points within a certain buffer distance, just as you did in your example. # Read back in as sf object, including row index pts <- st_as_sf(read.csv(csv_file), coords=c('LON', 'LAT'), crs=4326) dist_matrix <- dnearneigh(pts, 0, 100, use_s2=TRUE) # use_s2 since these are lon/lat Now I prepare a function to get the minimum Conv value among all points within the buffer distance to a given single point: # Function to get minimum Conc values for one row of distance matrix MinConc <- function(x, lst, pts) { # x is an index to a single point, # lst is a list of point indices from distance matrix # that are within the buffer distance Concs <- lapply(lst, function(p) { pts$Conc[p] }) return(min(Concs[[1]])) } Next run that function on all points to get a list of minimum Conv values for all points, and merge back to pts. # Now apply this function to all points in pts Conc_min <- lapply(pts$X, function(i){ MinConc(i, dist_matrix[i], pts) }) Conc_min <- data.frame("Conc_min" = as.integer(Conc_min)) # Add back as new attrib to original points sf object pts_with_min <- do.call(cbind, c(pts, Conc_min)) HTH, Micha On 06/11/2022 18:40, Duhl, Tiffany R. wrote: > Thanks so much Eric! > > I'm going to play around with your toy code (pun intended) & see if I can > make it work for my application. > > Cheers, > -Tiffany > ________________________________ > From: Eric Berger <ericjber...@gmail.com> > Sent: Sunday, November 6, 2022 10:27 AM > To: Bert Gunter <bgunter.4...@gmail.com> > Cc: Duhl, Tiffany R. <tiffany.d...@tufts.edu>; R-help <R-help@r-project.org> > Subject: [External] Re: [R] Selecting a minimum value of an attribute > associated with point values neighboring a given point and assigning it as a > new attribute > > Whoops ... left out a line in Part 2. Resending with the correction > > ## PART 2: You can use this code on the real data with f() defined > appropriately > A <- matrix(0,N,N) > v <- 1:N > ## get the indices (j,k) where j < k (as columns in a data.frame) > idx <- expand.grid(v,v) |> rename(j=Var1,k=Var2) |> filter(j < k) > u <- sapply(1:nrow(idx), > \(i){ j <- idx$j[i]; k <- idx$k[i]; A[j,k] <<- f(j,k,myData) }) > B <- A + t(A) + diag(N) > C <- diag(myData$Conc) > D <- B %*% C > D[D==0] <- NA > myData$Conc_min <- apply(D,MAR=1,\(v){min(v,na.rm=TRUE)}) > print(head(myData)) > > On Sun, Nov 6, 2022 at 5:19 PM Eric Berger <ericjber...@gmail.com> wrote: >> Hi Tiffany, >> Here is some code that might help with your problem. I solve a "toy" >> problem that is conceptually the same. >> Part 1 sets up my toy problem. You would have to replace Part 1 with >> your real case. The main point is to define >> a function f(i, j, data) which returns 0 or 1 depending on whether the >> observations in rows i and j in your dataset 'data' >> are within your cutoff distance (i.e. 50m). >> >> You can then use Part 2 almost without changes (except for changing >> 'myData' to the correct name of your data). >> >> I hope this helps, >> Eric >> >> library(dplyr) >> >> ## PART 1: create fake data for minimal example >> set.seed(123) ## for reproducibility >> N <- 5 ## replace by number of locations (approx 9000 in your case) >> MAX_DISTANCE <- 2 ## 50 in your case >> myData <- data.frame(x=rnorm(N),y=rnorm(N),Conc=sample(1:N,N)) >> >> ## The function which you must re-define for your actual case. >> f <- function(i,j,a) { >> dist <- sqrt(sum((a[i,1:2] - a[j,1:2])^2)) ## Euclidean distance >> as.integer(dist < MAX_DISTANCE) >> } >> >> ## PART 2: You can use this code on the real data with f() defined >> appropriately >> A <- matrix(0,N,N) >> ## get the indices (j,k) where j < k (as columns in a data.frame) >> idx <- expand.grid(v,v) |> rename(j=Var1,k=Var2) |> filter(j < k) >> u <- sapply(1:nrow(idx),\(i){ j <- idx$j[i]; k <- idx$k[i]; A[j,k] <<- >> f(j,k,myData) }) >> B <- A + t(A) + diag(N) >> C <- diag(myData$Conc) >> D <- B %*% C >> D[D==0] <- NA >> myData$Conc_min <- apply(D,MAR=1,\(v){min(v,na.rm=TRUE)}) >> print(head(myData)) >> >> >> On Sat, Nov 5, 2022 at 5:14 PM Bert Gunter <bgunter.4...@gmail.com> wrote: >>> Probably better posted on R-sig-geo. >>> >>> -- Bert >>> >>> On Sat, Nov 5, 2022 at 12:36 AM Duhl, Tiffany R. <tiffany.d...@tufts.edu> >>> wrote: >>> >>>> Hello, >>>> >>>> I have sets of spatial points with LAT, LON coords (unprojected, WGS84 >>>> datum) and several value attributes associated with each point, from >>>> numerous csv files (with an average of 6,000-9,000 points in each file) as >>>> shown in the following example: >>>> >>>> data<- read.csv("R_find_pts_testdata.csv") >>>> >>>>> data >>>> ID Date Time LAT LON Conc >>>> Leg.Speed CO2 H2O BC61 Hr Min Sec >>>> 1 76 4/19/2021 21:25:38 42.40066 -70.98802 99300 0.0 mph 428.39 9.57 >>>> 578 21 25 38 >>>> 2 77 4/19/2021 21:25:39 42.40066 -70.98802 96730 0.0 mph 428.04 9.57 >>>> 617 21 25 39 >>>> 3 79 4/19/2021 21:25:41 42.40066 -70.98802 98800 0.2 mph 427.10 9.57 >>>> 1027 21 25 41 >>>> 4 80 4/19/2021 21:25:42 42.40066 -70.98802 96510 2 mph 427.99 9.58 >>>> 1381 21 25 42 >>>> 5 81 4/19/2021 21:25:43 42.40067 -70.98801 95540 3 mph 427.99 9.58 >>>> 1271 21 25 43 >>>> 6 82 4/19/2021 21:25:44 42.40068 -70.98799 94720 4 mph 427.20 9.57 >>>> 910 21 25 44 >>>> 7 83 4/19/2021 21:25:45 42.40069 -70.98797 94040 5 mph 427.18 9.57 >>>> 652 21 25 45 >>>> 8 84 4/19/2021 21:25:46 42.40072 -70.98795 95710 7 mph 427.07 9.57 >>>> 943 21 25 46 >>>> 9 85 4/19/2021 21:25:47 42.40074 -70.98792 96200 8 mph 427.44 9.56 >>>> 650 21 25 47 >>>> 10 86 4/19/2021 21:25:48 42.40078 -70.98789 93750 10 mph 428.76 9.57 >>>> 761 21 25 48 >>>> 11 87 4/19/2021 21:25:49 42.40081 -70.98785 93360 11 mph 429.25 9.56 >>>> 1158 21 25 49 >>>> 12 88 4/19/2021 21:25:50 42.40084 -70.98781 94340 12 mph 429.56 9.57 >>>> 107 21 25 50 >>>> 13 89 4/19/2021 21:25:51 42.40087 -70.98775 92780 12 mph 428.62 9.56 >>>> 720 21 25 51 >>>> >>>> >>>> What I want to do is, for each point, identify all points within 50m of >>>> that point, find the minimum value of the "Conc" attribute of each nearby >>>> set of points (including the original point) and then create a new variable >>>> ("Conc_min") and assign this minimum value to a new variable added to >>>> "data". >>>> >>>> So far, I have the following code: >>>> >>>> library(spdep) >>>> library(sf) >>>> >>>> setwd("C:\\mydirectory\\") >>>> data<- read.csv("R_find_pts_testdata.csv") >>>> >>>> #make sure the data is a data frame >>>> pts <- data.frame(data) >>>> >>>> #create spatial data frame and define projection >>>> pts_coords <- cbind(pts$LON, pts$LAT) >>>> data_pts <- SpatialPointsDataFrame(coords= pts_coords, >>>> data=pts, proj4string = CRS("+proj=longlat +datum=WGS84")) >>>> >>>> #Re-project to WGS 84 / UTM zone 18N, so the analysis is in units of m >>>> ptsUTM <- sf::st_as_sf(data_pts, coords = c("LAT", "LON"), remove = F)%>% >>>> st_transform(32618) >>>> >>>> #create 50 m buffer around each point then intersect with points and >>>> finally find neighbors within the buffers >>>> pts_buf <- sf::st_buffer(ptsUTM, 50) >>>> coords <- sf::st_coordinates(ptsUTM) >>>> int <- sf::st_intersects(pts_buf, ptsUTM) >>>> x <- spdep::dnearneigh(coords, 0, 50) >>>> >>>> Now at this point, I'm not sure what to either the "int" (a sgbp list) or >>>> "x" (nb object) objects (or even if I need them both) >>>> >>>>> int >>>> Sparse geometry binary predicate list of length 974, where the predicate >>>> was `intersects' >>>> first 10 elements: >>>> 1: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... >>>> 2: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... >>>> 3: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... >>>> 4: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... >>>> 5: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... >>>> 6: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... >>>> 7: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... >>>> 8: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... >>>> 9: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... >>>> >>>>> x >>>> Neighbour list object: >>>> Number of regions: 974 >>>> Number of nonzero links: 34802 >>>> Percentage nonzero weights: 3.668481 >>>> Average number of links: 35.73101 >>>> >>>> One thought is that maybe I don't need the dnearneigh function and can >>>> instead convert "int" into a dataframe and somehow merge or associate >>>> (perhaps with an inner join) the ID fields of the buffered and intersecting >>>> points and then compute the minimum value of "Conc" grouping by ID: >>>> >>>>> as.data.frame(int) >>>> row.id col.id >>>> 1 1 1 >>>> 2 1 2 >>>> 3 1 3 >>>> 4 1 4 >>>> 5 1 5 >>>> 6 1 6 >>>> 7 1 7 >>>> 8 1 8 >>>> 9 1 9 >>>> 10 1 10 >>>> 11 1 11 >>>> 12 1 12 >>>> 13 1 13 >>>> 14 1 14 >>>> 15 1 15 >>>> 16 1 16 >>>> 17 1 17 >>>> 18 1 18 >>>> 19 2 1 >>>> 20 2 2 >>>> 21 2 3 >>>> 22 2 4 >>>> 23 2 5 >>>> 24 2 6 >>>> 25 2 7 >>>> 26 2 8 >>>> 27 2 9 >>>> 28 2 10 >>>> >>>> >>>> So in the above example I'd like to take the minimum of "Conc" among the >>>> col.id points grouped with row.id 1 (i.e., col.ids 1-18) and assign the >>>> minimum value of this group as a new variable in data (Data$Conc_min), and >>>> do the same for row.id 2 and all the rest of the rows. >>>> >>>> I'm just not sure how to do this and I appreciate any help folks might >>>> have on this matter! >>>> >>>> Many thanks, >>>> -Tiffany >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > Caution: This message originated from outside of the Tufts University > organization. Please exercise caution when clicking links or opening > attachments. When in doubt, email the TTS Service Desk at > i...@tufts.edu<mailto:i...@tufts.edu> or call them directly at 617-627-3376. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Micha Silver Ben Gurion Univ. Sde Boker, Remote Sensing Lab cell: +972-523-665918 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.