Dear R-users, I would like to speed up a double-loop I developed for detecting and removing outliers in my whole data.frame. The idea is to remove data with a too big difference with the previous value. If detected, this test must be done here on maximum the next 10 values following the last correct one (and put an index on another column).
It works well on a small data frame, but really too slowly for my real DF with 500 000 rows. Here's a fake data example and the double-loop: myts <- data.frame(x=c(1,2,50,40,30,40,100,1,50,1,2,3,3,5,4),y=NA) for(jj in 1:(nrow(myts)-10)){ for(nn in ((jj+1):(jj+10))) { if((!is.na(myts[jj,1])) & (!is.na(myts[nn,1])) & (abs((myts[nn,1])-(myts[jj,1]))>15)) { myts[nn,2] <- 1 myts[nn,1] <- NA } } } Can somebody explain me how can I speed this up easily? I heard about vectorization but I don't really understand how it works. -- View this message in context: http://r.789695.n4.nabble.com/How-to-speed-up-a-double-loop-tp4704054.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.