MartinMo> write.table with large data frames takes quite a long time MartinMo> system.time({ MartinMo> + write.table(df, '/tmp/dftest.txt', row.names=FALSE) MartinMo> + }, gcFirst=TRUE) MartinMo> user system elapsed MartinMo> 97.302 1.532 98.837
MartinMo> A reason is because dimnames is always called, causing 'anonymous' row MartinMo> names to be created as character vectors. Avoiding this in MartinMo> src/library/utils, along the lines of Thank you, Martin. Note that we needed to fix your patch (for the case where the dataframe has 'matrix column'), and I'd like to further remark that I consider '.... == TRUE ' to be quite ugly (or inefficient) in all circumstances. Martin Maechler, ETH Zurich Index: write.table.R =================================================================== --- write.table.R (revision 44717) +++ write.table.R (working copy) @@ -27,13 +27,18 @@ if(!is.data.frame(x) && !is.matrix(x)) x <- data.frame(x) + makeRownames <- is.logical(row.names) && !is.na(row.names) && + row.names==TRUE + makeColnames <- is.logical(col.names) && !is.na(col.names) && + col.names==TRUE if(is.matrix(x)) { ## fix up dimnames as as.data.frame would p <- ncol(x) d <- dimnames(x) if(is.null(d)) d <- list(NULL, NULL) - if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x)) - if(is.null(d[[2]]) && p > 0) d[[2]] <- paste("V", 1:p, sep="") + if (is.null(d[[1]]) && makeRownames) d[[1]] <- seq_len(nrow(x)) + if(is.null(d[[2]]) && p > 0 && makeColnames) + d[[2]] <- paste("V", 1:p, sep="") if(is.logical(quote) && quote) quote <- if(is.character(x)) seq_len(p) else numeric(0) } else { @@ -53,8 +58,8 @@ quote <- ord[quote]; quote <- quote[quote > 0] } } - d <- dimnames(x) - if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x)) + d <- list(if (makeRownames==TRUE) row.names(x) else NULL, + if (makeColnames==TRUE) names(x) else NULL) p <- ncol(x) } nocols <- p==0 > improves performance at least in proportion to nrow(x): > > system.time({ > + write.table(df, '/tmp/dftest1.txt', row.names=FALSE) > + }, gcFirst=TRUE) > user system elapsed > 8.132 0.608 8.899 > Martin > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > Location: Arnold Building M2 B169 > Phone: (206) 667-2793 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel