On Mon, 15 Nov 2010 13:28:40 -0500, Anthony Damico wrote: > Do you have any other ideas as to how I might diagnose what's going on > here? Or, alternatively, is there some workaround that would get this giant > CSV into a database? If you think there's a reasonable way to use the > IMPORT command with RSQLite, that seems like it would import the fastest, > but I don't know that it's compatible with DBI on Windows. > > Thanks again! > Anthony
If you are able to successfully read in one entire 1.2 GB file at a time, I would skip sqldf and do this (after setting the working directory appropriately): connSQLite = dbConnect(dbDriver("SQLite"), dbname = "acs") ss09pusa = read.csv("ss09pusa.csv", header = TRUE) dbWriteTable(connSQLite, "acs2009", ss09pusa) rm(ss09pusa) ss09pusb = read.csv("ss09pusb.csv", header = TRUE) dbWriteTable(connSQLite, "acs2009", ss09pusb, append = TRUE) rm(ss09pusb) #Test select of specific columns sql = "SELECT ST, PUMA, ADJINC FROM acs2009 WHERE ST = 33" dfIncome = dbGetQuery(connSQLite, sql) I was *not* able to load one entire table at a time, so I was able to make it work by combining sqldf to read in chunks at a time and dbWriteTable from RSQLite to write each chunk to the database. The read would then look like, for example: ss09pusa = read.csv.sql("ss09pusa.csv", sql = paste("SELECT * FROM file WHERE ST =", i)) where i is an iteration over the state FIPS codes. (You could just use 1 to 56, even though there's a few missing numbers in there. The searches for nonexistent records will take time but will otherwise be harmless.) The dbWriteTable would be the same, with every write after the first one using append = TRUE. Also, I assume you will want to do the read from csv / write to SQLite *once*, then maintain the SQLite database for other sessions. We also use large census and IPUMS data sets, and we keep everything in a Postgres backend, which we access with RPostgreSQL. Much easier to keep everything organized in an RDBMS than to pass around these monstrous csv files. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.