Perhaps I will need to create a temp table, but I am asking if there is a way to avoid it. It would be great if there were a way to tie the R data frame temporarily to the query in a transparent fashion. If not, I will see if I can create/drop the temp table directly from sqlQuery. -Avram
On Thursday, September 11, 2008, at 12:07PM, "Aaron Mackey" <[EMAIL PROTECTED]> wrote: >Sorry, I see now you want to avoid this, but you did ask what was the >"best way to efficiently ...", and the temp. table solution certainly >matches your description. What's wrong with using a temporary table? > >-Aaron > >On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey <[EMAIL PROTECTED]> wrote: >> I would load your set of userid's into a temporary table in oracle, >> then join that table with the rest of your SQL query to get only the >> matching rows out. >> >> -Aaron >> >> On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony <[EMAIL PROTECTED]> wrote: >>> >>> Dear R list, >>> >>> What is the best way to efficiently marry an R dataset with a very large >>> (Oracle) database table? >>> >>> The goal is to only return Oracle table rows that match IDs present in the >>> R dataset. >>> I have an R data frame with 2000 user IDs analogous to: r = >>> data.frame(userid=round(runif(2000)*100000,0)) >>> >>> ...and I need to pull data from an Oracle table only for these 2000 IDs. >>> The Oracle table is quite large. Additionally, the sql query may need to >>> join to other tables to bring in ancillary fields. >>> >>> I currently connect to Oracle via odbc: >>> >>> library(RODBC) >>> connection <- odbcConnect("****", uid="****", pwd="****") >>> d = sqlQuery(connection, "select userid, x, y, z from largetable where >>> timestamp > sysdate -7") >>> >>> ...allowing me to pull data from the database table into the R object "d" >>> and then use the R merge function. The problem however is that if "d" is >>> too large it may fail due to memory limitations or be inefficient. I would >>> like to push the merge portion to the database and it would be very >>> convenient if it were possible to request that the query look to the R >>> object for the ID's to which it should restrict the output. >>> >>> Is there a way to do this? >>> Something like the following fictional code: >>> d = sqlQuery(connection, "select t.userid, x, y, z from largetable t where >>> r$userid=t.userid") >>> >>> Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? >>> This would be convenient and help me avoid needing to create a temporary >>> table to store the R data, join via sql, then return the data back to R. >>> >>> I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . >>> Thanks for your comments, ideas, recommendations. >>> >>> >>> -Avram >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.