Hi Anthony, On Mon, Nov 15, 2010 at 7:07 AM, Anthony Damico <ajdam...@gmail.com> wrote: > Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm trying to > insert a very large CSV file into a SQLite database. I'm pretty new to > working with databases in R, so I apologize if I'm overlooking something > obvious here.
Working with large data and doing operations in bounded memory tends not to be among the most obvious things to accomplish in R for new comers. > I need to combine both of these CSV files into one table (within a > database), so I think that I'd need a SQL manipulation technique that reads > everything line by line, instead of pulling it all into memory. Yes, that sounds like the right approach. Below is some sample code that does this and avoids reading the entire data set into memory. The key points are to use read.table on a file connection and to read in a batch of lines at a time. Each batch is inserted into the db using a prepared query. The example is complicated a bit because read.table on a file connection raises an error if no lines are available, so I chose to use tryCatch to handle that. A cleaner approach might be to check the number of rows read and break out of the while loop if the count is less than the batch size. Anyhow, see if this approach works for you. library("RSQLite") file_list <- c("ss09pusa.csv", "ss09pusb.csv") input <- file(file_list[1], "r") db <- dbConnect(SQLite(), dbname="example.sqlite") header <- readLines(input, n = 1) fields <- strsplit(header, ",")[[1]] colTypes <- rep("TEXT", length(fields)) colDecl <- paste(fields, colTypes) sql <- sprintf("CREATE TABLE ss09 (%s)", paste(colDecl, collapse = ", ")) dbGetQuery(db, sql) colClasses <- rep("character", length(fields)) sql.in <- sprintf("INSERT INTO ss09 VALUES (%s)", paste(rep("?", length(fields)), collapse = ",")) chunk_size <- 250000 dbBeginTransaction(db) tryCatch({ while (TRUE) { part <- read.table(input, nrows=chunk_size, sep=",", colClasses = colClasses, comment.char = "") dbGetPreparedQuery(db, sql.in, bind.data = part) } }, error = function(e) { if (grepl("no lines available", conditionMessage(e))) TRUE else stop(conditionMessage(e)) }) dbCommit(db) -- Seth Falcon | @sfalcon | http://userprimary.net/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.