I have problems converting my dataset from long to wide format. Previous attempts using reshape package and aggregate function were unsuccessful as they took too long. Apparently, my simplified solution also lasted as long. My complete codes is given below. When sample.size = 10000, the execution takes about 20 seconds. But sample.size = 100000 seems to take eternity. My actual sample.size is 15000000 i.e. 15 million. sample.size <- 10000
m <- data.frame(Name=sample(1:100000, sample.size, T), Type=sample(1:1000, sample.size, T), Predictor=sample(LETTERS[1:10], sample.size, T)) res <- function(m) { m.12.unique <- unique(m[,1:2]) m.12.unique <- m.12.unique[order(m.12.unique[,1], m.12.unique[,2]),] v1 <- paste(m.12.unique[,1], m.12.unique[,2], sep=".") v2 <- c(sort(unique(m[,3]))) res <- matrix(0, nr=length(v1), nc=length(v2), dimnames=list(v1, v2)) m.ids <- paste(m[,1], m[,2], sep=".") for(i in 1:nrow(m)) { x <- m.ids[i] y <- m[i,3] res[x, y] <- res[x, y] + 1 } res <- data.frame(m.12.unique[,1], m.12.unique[,2], res, row.names=NULL) colnames(res) <- c("Name", "Type", v2) return(res) } res(m) > sessionInfo() R version 2.8.0 (2008-10-20) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.