Just tried it on my work computer (Windows XP, I only have 2 GB RAM): I've run your code, just indicated the separator "|" in read.table (in DF line) and added the actual processing (writing out of the result with a file name) - see below. I got: Error in textConnection(x) : cannot allocate memory for text connection
Thanks again for helping! Dimitri ### New code from Gabor: k <- 1000000 # no of rows per chunk first <- TRUE con <- file('myfile.txt', "r") count<-1 repeat { start<-Sys.time() print(start) flush.console() # skip header if (first) hdgs <- readLines(con, 1) first <- FALSE x <- readLines(con, k) if (length(x) == 0) break DF <- read.table(textConnection(x), header = FALSE,sep="|") # process chunk -- we just print last row here end<-Sys.time() print(end-start) print(names(DV)) print(tail(DF, 1)) flush.console() filename<-paste("Chunk of 1 Mil number ",count,".txt",sep="") write.table(DF,sep="\t",header=FALSE,file=filename) count<-count+1 } close(con) On Sat, Oct 23, 2010 at 10:19 AM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > On Sat, Oct 23, 2010 at 10:07 AM, Dimitri Liakhovitski > <dimitri.liakhovit...@gmail.com> wrote: >> I just tried it: >> >> for(i in 11:16){ #i<-11 >> start<-Sys.time() >> print(start) >> flush.console() >> filename<-paste("skipped millions- ",i,".txt",sep="") >> mydata<-read.csv.sql("myfilel.txt", sep="|", eol="\r\n", sql = >> "select * from file limit 1000000, (1000000*i-1)") > > The SQL statement does not know anything about R variables. You would > need something like this: > >> i <- 1 >> s <- sprintf("select from file limit 10, %d", 10*1-1) >> s > [1] "select from file limit 10, 9" >> read.csv.sql(..., sql = s, ...) > > Also if you just want to read it in as chunks reading from a > connection in R would be sufficient: > > k <- 5000 # no of rows per chunk > first <- TRUE > con <- file('myfile.csv', "r") > repeat { > > # skip header > if (first) hdgs <- readLines(con, 1) > first <- FALSE > > x <- readLines(con, k) > if (length(x) == 0) break > DF <- read.csv(textConnection(x), header = FALSE) > > # process chunk -- we just print last row here > print(tail(DF, 1)) > > } > close(con) > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > -- Dimitri Liakhovitski Ninah Consulting www.ninah.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.