Hi Experts, I am new to R, using following sample code for capping outliers using percentile information. Working on large data (30000 observations and 150 variables), loop I am using in the below mentioned code for detecting outliers and capping to upper /lower percentile value is taking much time for the execution. Is there anything wrong with code, can anyone suggest improvement in the script to enhance performance! min_pctle_cut <- 0.01 max_pctle_cut <- 0.99 library(outliers)
n <- 100 x1 <- runif(n) x2 <- runif(n) x3 <- x1 + x2 + runif(n)/10 x4 <- x1 + x2 + x3 + runif(n)/10 x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) x6 <- factor(1*(x5=='a' | x5=='c')) data1 <- cbind(x1,x2,x3,x4,x5,x6) x <- data.frame(data1) z <- x[,sapply(x,is.numeric)] qs <- sapply(z, function(z) quantile(z, c(min_pctle_cut, max_pctle_cut), na.rm = TRUE)) #Loop below taking time for execution system.time(for (i in 1:ncol(z)) { for (j in 1:nrow(z)) { if (z[j,i] < qs[1,i]) z[j,i]=qs[1,i] if (z[j,i] > qs[2,i]) z[j,i]=qs[2,i] } }) -- View this message in context: http://r.789695.n4.nabble.com/Capping-outliers-tp4094647p4094647.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.