Hi Gabor, Thank you for the prompt reply. I definitely looked over all of the examples on the code.google.com sqldf page before sending, which is why I wrote the code
read.csv.sql("ss09pusa.csv" , sql="create table ss09pusa as select * from file" , dbname="sqlite") directly pulled from their code -- read.csv.sql("~/tmp.csv", sql = "create table mytab as select * from file",dbname = "mydb") ..but I don't understand why this helps me around the memory problem, since I think it still all gets read into memory. Is there a way to do this line by line? I would prefer to use SQLite than sqldf, but I could not get the IMPORT command (or .IMPORT) functioning at all. I tried these with both dbGetQuery and dbSendQuery. library(RSQLite) setwd("R:\\American Community Survey\\Data\\2009") out_db <- dbConnect(SQLite(), dbname="sqlite.db") dbGetQuery(out_db , "create table test (hello integer, world text)") dbGetQuery(out_db , "mode csv") dbGetQuery(out_db , "import test.csv test") When I hit the mode and import commands, it gives me an error that makes me think it's handling these files in a completely different way. > dbGetQuery(out_db , "mode csv") Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: near "mode": syntax error) I suppose I could just run sqlite3 commands from the system() function, but I was hoping there might be a way to accomplish this task entirely within R? Thanks again! On Mon, Nov 15, 2010 at 10:41 AM, Gabor Grothendieck < ggrothendi...@gmail.com> wrote: > On Mon, Nov 15, 2010 at 10:07 AM, Anthony Damico <ajdam...@gmail.com> > wrote: > > Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm trying > to > > insert a very large CSV file into a SQLite database. I'm pretty new to > > working with databases in R, so I apologize if I'm overlooking something > > obvious here. > > > > I'm trying to work with the American Community Survey data, which is two > > 1.3GB csv files. I have enough RAM to read one of them into memory, but > not > > both at the same time. So, in order to analyze them, I'm trying to get > them > > into a SQLite database so I can use the R survey package's > database-backed > > survey objects capabilities ( > > http://faculty.washington.edu/tlumley/survey/svy-dbi.html). > > > > I need to combine both of these CSV files into one table (within a > > database), so I think that I'd need a SQL manipulation technique that > reads > > everything line by line, instead of pulling it all into memory. > > > > I've tried using read.csv.sql, but it finishes without an error and then > > only shows me the table structure when I run the final select statement. > > When I run these exact same commands on a smaller CSV file, they work > fine. > > I imagine this is not working because the csv is so large, but I'm not > sure > > how to confirm that or what to change if it is. I do want to get all > > columns from the CSV into the data table, so I don't want to filter > > anything. > > > > library(sqldf) > > setwd("R:\\American Community Survey\\Data\\2009") > > sqldf("attach 'sqlite' as new") > > read.csv.sql("ss09pusa.csv" , sql="create table ss09pusa as select * from > > file" , dbname="sqlite") > > sqldf("select * from ss09pusa limit 3",dbname="sqlite") > > > > What the above code does, which is unlikely to be what you intended, > is to create an sqlite database called 'sqlite' and then read in the > indicated file into sqlite, read it in into R from sqlite (clearly > this step will fail if the data is too big for R but it its not then > you are ok) and then delete the table from the database so your sqldf > statement should give an error since there is no such table or else if > you have a data frame in your R workspace called ss09pusa the sqldf > statement will load that into a database table and the retrieve its > first three rows and then delete the table. > > This sort of task is probably more suitable for RSQLite than sqldf > but if you wish to do it with sqldf you need to follow example 9 or > example 10 on the sqldf home page: > > In example 9, > > http://code.google.com/p/sqldf/#Example_9.__Working_with_Databases > > its very important to note that sqldf automatically deletes any table > that it created after the sqldf or read.csv.sql statement is done so > to not have the table dropped is to make sure you issue an sql > statement that creates the table, "create table mytab as select ..." > rather than sqldf. > > In example 10, > > http://code.google.com/p/sqldf/#Example_10._Persistent_Connections > > persistent connections are illustrated which represents an alternate > way to do this in sqldf. > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.