write.table with large data frames takes quite a long time > system.time({ + write.table(df, '/tmp/dftest.txt', row.names=FALSE) + }, gcFirst=TRUE) user system elapsed 97.302 1.532 98.837
A reason is because dimnames is always called, causing 'anonymous' row names to be created as character vectors. Avoiding this in src/library/utils, along the lines of Index: write.table.R =================================================================== --- write.table.R (revision 44717) +++ write.table.R (working copy) @@ -27,13 +27,18 @@ if(!is.data.frame(x) && !is.matrix(x)) x <- data.frame(x) + makeRownames <- is.logical(row.names) && !is.na(row.names) && + row.names==TRUE + makeColnames <- is.logical(col.names) && !is.na(col.names) && + col.names==TRUE if(is.matrix(x)) { ## fix up dimnames as as.data.frame would p <- ncol(x) d <- dimnames(x) if(is.null(d)) d <- list(NULL, NULL) - if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x)) - if(is.null(d[[2]]) && p > 0) d[[2]] <- paste("V", 1:p, sep="") + if (is.null(d[[1]]) && makeRownames) d[[1]] <- seq_len(nrow(x)) + if(is.null(d[[2]]) && p > 0 && makeColnames) + d[[2]] <- paste("V", 1:p, sep="") if(is.logical(quote) && quote) quote <- if(is.character(x)) seq_len(p) else numeric(0) } else { @@ -53,8 +58,8 @@ quote <- ord[quote]; quote <- quote[quote > 0] } } - d <- dimnames(x) - if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x)) + d <- list(if (makeRownames==TRUE) row.names(x) else NULL, + if (makeColnames==TRUE) names(x) else NULL) p <- ncol(x) } nocols <- p==0 improves performance at least in proportion to nrow(x): > system.time({ + write.table(df, '/tmp/dftest1.txt', row.names=FALSE) + }, gcFirst=TRUE) user system elapsed 8.132 0.608 8.899 Martin -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel