Hi, I am bootstrapping, but my loops are taking way too long & I need to make it faster. Looking on the R-help archive I suspect it may be due to not specifying the size of my data.frame, mainly because I don't know in advance how large it has to be. Can anyone help?
My data looks like this (first 5 entries of 'SpeyBay'): Year JulianDay Hour Day Month Quarter Season SeaState Visibility TideState 1 2005 91 6 1 4 2 2 2 2 2.18 2 2005 91 7 1 4 2 2 2 2 1.53 3 2005 91 9 1 4 2 2 2 3 0.80 4 2005 91 11 1 4 2 2 2 4 0.96 5 2005 91 14 1 4 2 2 1 6 2.25 TideHeight CetPres Segment 1 2 0 1 2 3 0 1 3 5 0 2 4 -5 0 3 5 -2 0 4 I am bootstrapping 1000 times but re-sampling on segment (since my data is autocorrelated), which means I am trying to reconstruct my data based on random segments e.g. segment 3, then segment 1, each of which may include from 1-14 data rows. So I don't know how many rows I am going to get in advance. When I run my for loop, I just use rbind with undefined size of the new variable e.g. 'tempD2', and I suspect it is this that is slowing down the whole process (probably partly due to having a for loop within a for loop). Can anyone give me any advice on how to pre-define a data frame (if this is what the data shown above is) that can have an undefined size - or how to make it big enough to take all the data?). I've been trying to figure this out for ages with no luck & sure it's something simple! Code shown below - any tips on making the code faster would be greatly appreciated - the last run took several hours which is just not practical! Many thanks in advance, Clare Embling CODE: SpringWatch <- 504 SummerWatch <- 704 AutumnWatch <- 392 MaxSample <- 704 signif <- 0 for(j in 1:1000){ # resampling 2 different years (D & E) in 3 different seasons (2, 3 & 4) separately D2S <- sample(D2Start:D2Stop,MaxSample,replace=T) D3S <- sample(D3Start:D3Stop,MaxSample,replace=T) D4S <- sample(D4Start:D4Stop,MaxSample,replace=T) E2S <- sample(E2Start:E2Stop,MaxSample,replace=T) E3S <- sample(E3Start:E3Stop,MaxSample,replace=T) E4S <- sample(E4Start:E4Stop,MaxSample,replace=T) # Creating new data frames with the first sampled segment TempD2 <- SpeyBay[(Segment==D2S[1]),] TempD3 <- SpeyBay[(Segment==D3S[1]),] TempD4 <- SpeyBay[(Segment==D4S[1]),] TempE2 <- SpeyBay[(Segment==E2S[1]),] TempE3 <- SpeyBay[(Segment==E3S[1]),] TempE4 <- SpeyBay[(Segment==E4S[1]),] # loop to add together all the rows of data for each segment sampled for(i in 2:MaxSample) { TempD2 <- rbind(TempD2,SpeyBay[(Segment==D2S[i]),]) TempD3 <- rbind(TempD3,SpeyBay[(Segment==D3S[i]),]) TempD4 <- rbind(TempD4,SpeyBay[(Segment==D4S[i]),]) TempE2 <- rbind(TempE2,SpeyBay[(Segment==E2S[i]),]) TempE3 <- rbind(TempE3,SpeyBay[(Segment==E3S[i]),]) TempE4 <- rbind(TempE4,SpeyBay[(Segment==E4S[i]),]) } # But actually I only want a certain number of rows of data... NewD2 <- TempD2[1:SpringWatch,] NewD3 <- TempD3[1:SummerWatch,] NewD4 <- TempD4[1:AutumnWatch,] NewE2 <- TempE2[1:SpringWatch,] NewE3 <- TempE3[1:SummerWatch,] NewE4 <- TempE4[1:AutumnWatch,] # then combine together (could do this in one step! NewD <- rbind(NewD2,NewD3,NewD4) NewE <- rbind(NewE2,NewE3,NewE4) CompDE <- rbind(NewD,NewE) #Run a GLM-GEE on the resampled distributions to see if there is a statistical difference between years NewGLMGEE1 <- geeglm(CetPres~Year++SeaState,data=CompDE,family=binomial,id=Segment,corstr="ar1") pv <- summary(NewGLMGEE1)$coefficients[, "Pr(>|W|)"] ## will extract them signif[j] <- pv[2] # only interested in the significance of Year in the model } -- View this message in context: http://r.789695.n4.nabble.com/For-loop-processing-too-slow-pre-format-data-frame-tp4689543.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.