On Fri, Oct 22, 2010 at 6:41 PM, Mike Marchywka <marchy...@hotmail.com> wrote: >> From: ggrothendi...@gmail.com >> Date: Fri, 22 Oct 2010 18:28:14 -0400 >> To: dimitri.liakhovit...@gmail.com >> CC: r-help@r-project.org >> Subject: Re: [R] How long does skipping in read.table take >> >> On Fri, Oct 22, 2010 at 5:17 PM, Dimitri Liakhovitski >> wrote: >> > I know I could figure it out empirically - but maybe based on your >> > experience you can tell me if it's doable in a reasonable amount of >> > time: >> > I have a table (in .txt) with a 17,000,000 rows (and 30 columns). >> > I can't read it all in (there are many strings). So I thought I could >> > read it in in parts (e.g., 1 milllion) using nrows= and skip. >> > I was able to read in the first 1,000,000 rows no problem in 45 sec. >> > But then I tried to skip 16,999,999 rows and then read in things. Then >> > R crashed. Should I try again - or is it too many rows to skip for R? >> > >>
What we are doing is not related to that. Its simply a matter that the default backend to sqldf, sqlite, can be faster than R and can handle larger datasets too (since sqlite does not have to store it all in memory like R does) so pushing as much as one can onto sqlite and then just grabbing what you need into R at the end can circumvent bottlenecks in R. Since its just a matter of writing one line of code, read.csv.sql vs. read.csv its relatively simple to try it out. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.