Check out read.csv.sql in the sqldf package. It reads a file directly into sqlite without going through R and then from there into R. It sets up the database and file layouts in the database for you and also destroys the database when finished so reading is just a matter of one line of R code. It also has the capability of reading any portion of the file that can be specified in sql. See examples on home page: http://sqldf.googlecode.com
On Tue, Mar 16, 2010 at 12:51 PM, Joe Calderon <calderon....@gmail.com> wrote: > hello *, im running into two major bottlenecks an R script. > > 1. going through a 40mb file and reading in via readLines() 1 line at > a time is almost an order of magnitude slow than the equivalent in > python, im wondering if there are alternatives to readLines(), doing > more lines at a time helps a bit > > 2. generating date sequences takes a long time, im basically doing > something like seq.Date(Sys.Date(), length.out = 300, by ='day') a lot > while digging into it, i strace'd the running process and it seems the > bulk of the time is spent checking for /etc/localtime > > stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 > > > strace -cp 2964 > Process 2964 attached - interrupt to quit > ^CProcess 2964 detached > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 94.61 0.006387 0 55872 stat > 2.58 0.000174 0 568 read > 1.42 0.000096 0 285 write > 1.39 0.000094 1 137 brk > ------ ----------- ----------- --------- --------- ---------------- > 100.00 0.006751 56862 total > > > > has anybody ran into similar problems? > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.