Hi Gabor,

Thank you for the prompt reply.  I definitely looked over all of the
examples on the code.google.com sqldf page before sending, which is why I
wrote the code

read.csv.sql("ss09pusa.csv" , sql="create table ss09pusa as select * from
file" , dbname="sqlite")

directly pulled from their code -- read.csv.sql("~/tmp.csv", sql = "create
table mytab as select * from file",dbname = "mydb")

..but I don't understand why this helps me around the memory problem, since
I think it still all gets read into memory.  Is there a way to do this line
by line?

I would prefer to use SQLite than sqldf, but I could not get the IMPORT
command (or .IMPORT) functioning at all.  I tried these with both dbGetQuery
and dbSendQuery.

library(RSQLite)
setwd("R:\\American Community Survey\\Data\\2009")
out_db <- dbConnect(SQLite(), dbname="sqlite.db")
dbGetQuery(out_db , "create table test (hello integer, world text)")
dbGetQuery(out_db , "mode csv")
dbGetQuery(out_db , "import test.csv test")

When I hit the mode and import commands, it gives me an error that makes me
think it's handling these files in a completely different way.

> dbGetQuery(out_db , "mode csv")
Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: near "mode": syntax error)


I suppose I could just run sqlite3 commands from the system() function, but
I was hoping there might be a way to accomplish this task entirely within R?



Thanks again!



On Mon, Nov 15, 2010 at 10:41 AM, Gabor Grothendieck <
ggrothendi...@gmail.com> wrote:

> On Mon, Nov 15, 2010 at 10:07 AM, Anthony Damico <ajdam...@gmail.com>
> wrote:
> > Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32.  I'm trying
> to
> > insert a very large CSV file into a SQLite database.  I'm pretty new to
> > working with databases in R, so I apologize if I'm overlooking something
> > obvious here.
> >
> > I'm trying to work with the American Community Survey data, which is two
> > 1.3GB csv files.  I have enough RAM to read one of them into memory, but
> not
> > both at the same time.  So, in order to analyze them, I'm trying to get
> them
> > into a SQLite database so I can use the R survey package's
> database-backed
> > survey objects capabilities (
> > http://faculty.washington.edu/tlumley/survey/svy-dbi.html).
> >
> > I need to combine both of these CSV files into one table (within a
> > database), so I think that I'd need a SQL manipulation technique that
> reads
> > everything line by line, instead of pulling it all into memory.
> >
> > I've tried using read.csv.sql, but it finishes without an error and then
> > only shows me the table structure when I run the final select statement.
> > When I run these exact same commands on a smaller CSV file, they work
> fine.
> > I imagine this is not working because the csv is so large, but I'm not
> sure
> > how to confirm that or what to change if it is.  I do want to get all
> > columns from the CSV into the data table, so I don't want to filter
> > anything.
> >
> > library(sqldf)
> > setwd("R:\\American Community Survey\\Data\\2009")
> > sqldf("attach 'sqlite' as new")
> > read.csv.sql("ss09pusa.csv" , sql="create table ss09pusa as select * from
> > file" , dbname="sqlite")
> > sqldf("select * from ss09pusa limit 3",dbname="sqlite")
> >
>
> What the above code does, which is unlikely to be what you intended,
> is to create an sqlite database called 'sqlite' and then read in the
> indicated file into sqlite, read it in into R from sqlite (clearly
> this step will fail if the data is too big for R but it its not then
> you are ok) and then delete the table from the database so your sqldf
> statement should give an error since there is no such table or else if
> you have a data frame in your R workspace called ss09pusa the sqldf
> statement will load that into a database table and the retrieve its
> first three rows and then delete the table.
>
> This sort of task is probably more suitable for RSQLite than sqldf
> but if you wish to do it with sqldf you need to follow example 9 or
> example 10 on the sqldf home page:
>
> In example 9,
>
> http://code.google.com/p/sqldf/#Example_9.__Working_with_Databases
>
> its very important to note that sqldf automatically deletes any table
> that it created after the sqldf or read.csv.sql statement is done so
> to not have the table dropped is to make sure you issue an sql
> statement that creates the table, "create table mytab as select ..."
> rather than sqldf.
>
> In example 10,
>
> http://code.google.com/p/sqldf/#Example_10._Persistent_Connections
>
> persistent connections are illustrated which represents an alternate
> way to do this in sqldf.
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to