Perhaps I will need to create a temp table, but I am asking if there is a way 
to avoid it.  It would be great if there were a way to tie the R data frame 
temporarily to the query in a transparent fashion. If not, I will see if I can 
create/drop the temp table directly from sqlQuery.
-Avram


 
On Thursday, September 11, 2008, at 12:07PM, "Aaron Mackey" <[EMAIL PROTECTED]> 
wrote:
>Sorry, I see now you want to avoid this, but you did ask what was the
>"best way to efficiently ...", and the temp. table solution certainly
>matches your description.  What's wrong with using a temporary table?
>
>-Aaron
>
>On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey <[EMAIL PROTECTED]> wrote:
>> I would load your set of userid's into a temporary table in oracle,
>> then join that table with the rest of your SQL query to get only the
>> matching rows out.
>>
>> -Aaron
>>
>> On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony <[EMAIL PROTECTED]> wrote:
>>>
>>> Dear R list,
>>>
>>> What is the best way to efficiently marry an R dataset with a very large 
>>> (Oracle) database table?
>>>
>>> The goal is to only return Oracle table rows that match IDs present in the 
>>> R dataset.
>>> I have an R data frame with 2000 user IDs analogous to: r = 
>>> data.frame(userid=round(runif(2000)*100000,0))
>>>
>>> ...and I need to pull data from an Oracle table only for these 2000 IDs.  
>>> The Oracle table is quite large. Additionally, the sql query may need to 
>>> join to other tables to bring in ancillary fields.
>>>
>>> I currently connect to Oracle via odbc:
>>>
>>> library(RODBC)
>>> connection <- odbcConnect("****", uid="****", pwd="****")
>>> d = sqlQuery(connection, "select userid, x, y, z from largetable where 
>>> timestamp > sysdate -7")
>>>
>>> ...allowing me to pull data from the database table into the R object "d" 
>>> and then use the R merge function.  The problem however is that if "d" is 
>>> too large it may fail due to memory limitations or be inefficient.  I would 
>>> like to push the merge portion to the database and it would be very 
>>> convenient if it were possible to request that the query look to the R 
>>> object for the ID's to which it should restrict the output.
>>>
>>> Is there a way to do this?
>>> Something like the following fictional code:
>>> d = sqlQuery(connection, "select t.userid, x, y, z from largetable t where 
>>> r$userid=t.userid")
>>>
>>> Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? 
>>>   This would be convenient and help me avoid needing to create a temporary 
>>> table to store the R data, join via sql, then return the data back to R.
>>>
>>> I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 .
>>> Thanks for your comments, ideas, recommendations.
>>>
>>>
>>> -Avram
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to