Gabor, thanks a lot. So, I don't really need sql? That's great. I'll try your code. To finish with sql, I've run this: (I wanted to skip the first 11 million rows)
mydata<-read.csv.sql("my.file.txt", sep="|", eol="\r\n", sql = "select * from file limit 1000000, 10999999") After 20 min (on a 4-core 64-bit Windows 7 PC with 6 GB RAM (I assume only 4 can be used?)) I got this error: Error: cannot allocate vector of size 42.0 Mb So, I guess On Sat, Oct 23, 2010 at 10:19 AM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > On Sat, Oct 23, 2010 at 10:07 AM, Dimitri Liakhovitski > <dimitri.liakhovit...@gmail.com> wrote: >> I just tried it: >> >> for(i in 11:16){ #i<-11 >> start<-Sys.time() >> print(start) >> flush.console() >> filename<-paste("skipped millions- ",i,".txt",sep="") >> mydata<-read.csv.sql("myfilel.txt", sep="|", eol="\r\n", sql = >> "select * from file limit 1000000, (1000000*i-1)") > > The SQL statement does not know anything about R variables. You would > need something like this: > >> i <- 1 >> s <- sprintf("select from file limit 10, %d", 10*1-1) >> s > [1] "select from file limit 10, 9" >> read.csv.sql(..., sql = s, ...) > > Also if you just want to read it in as chunks reading from a > connection in R would be sufficient: > > k <- 5000 # no of rows per chunk > first <- TRUE > con <- file('myfile.csv', "r") > repeat { > > # skip header > if (first) hdgs <- readLines(con, 1) > first <- FALSE > > x <- readLines(con, k) > if (length(x) == 0) break > DF <- read.csv(textConnection(x), header = FALSE) > > # process chunk -- we just print last row here > print(tail(DF, 1)) > > } > close(con) > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > -- Dimitri Liakhovitski Ninah Consulting www.ninah.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.