On Sat, Oct 23, 2010 at 10:52 AM, Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com> wrote: > Just tried it on my work computer (Windows XP, I only have 2 GB RAM): > I've run your code, just indicated the separator "|" in read.table (in > DF line) and added the actual processing (writing out of the result > with a file name) - see below. > I got: > Error in textConnection(x) : cannot allocate memory for text connection > > Thanks again for helping! > Dimitri > > ### New code from Gabor: > k <- 1000000 # no of rows per chunk > first <- TRUE > con <- file('myfile.txt', "r") > count<-1 > > repeat { > > start<-Sys.time() > print(start) > flush.console() > > # skip header > if (first) hdgs <- readLines(con, 1) > first <- FALSE > > x <- readLines(con, k) > if (length(x) == 0) break > DF <- read.table(textConnection(x), header = FALSE,sep="|") > > # process chunk -- we just print last row here > end<-Sys.time() > print(end-start) > print(names(DV)) > print(tail(DF, 1)) > flush.console() > filename<-paste("Chunk of 1 Mil number ",count,".txt",sep="") > write.table(DF,sep="\t",header=FALSE,file=filename) > count<-count+1 > } > close(con) > >
Try smaller chunks. Presumably R cannot handle chunks that large. Also, you could use RSQLite or sqldf to set up a database and then read from it. Again, don't use chunks larger than what R can handle. Here is a self contained example that you can copy and paste into an R session. It works on my Windows system but you might need to change the eol if you are working on a different platform. Reading the file into the database is the slowest part but once its there the rest should be reasonably fast. Again be sure not to read such large chunks at a time that R cannot handle them. library(sqldf) ## create test file numStr <- as.character(1:25) DF <- data.frame(a = 1:25, 101:125) write.table(DF, file = "myfile.csv", quote = FALSE, sep = ",", row.names = FALSE) ## define connection with attributes myfile <- file("myfile.csv") attr(myfile, "file.format") <- list(header = TRUE, sep = ",", eol = "\r\n") ## create new sqlite database sqldf("attach 'mydb' as new") ## read file into mytab table of mydb database sqldf("create table mytab as select * from myfile", dbname = "mydb") ## check that its there sqldf("select * from sqlite_master", dbname = "mydb") sqldf("select count(*) from mytab", dbname = "mydb") # Read in 5 lines after skipping 10 rows. sqldf("select * from mytab limit 5 offset 10", dbname = "mydb") -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.