Re: [R] Avoiding multiple outputs using RODBC package

Prof Brian Ripley Mon, 15 Dec 2008 02:50:22 -0800

On Sun, 14 Dec 2008, Mark Wardle wrote:

Hi.


You don't need to download the whole of the output database table to
look for an already generated answer. You can write a SQL query to do
that instead. ie. give me any rows with these parameters...  Get the
database to do the work - it is what they are designed to do.

So the procedure is:

1. Get input parameters
2. Query the output database to see whether analysis has already been
done (select * from output_table where...)
3. If not already done, do the calculation and insert result into output table

Note: you don't have to use sqlSave to save data. One can add single
rows by running arbitrary SQL.


Or sqlUpdate.


Mark

2008/12/12 Brigid Mooney <bkmoo...@gmail.com>:

I am using R as a data manipulation tool for a SQL database.  So in some of
my R scripts I use the RODBC package to retreive data, then run analysis,
and use the sqlSave function in the RODBC package to store the results in a
database.

There are two problems I want to avoid, and they are highly related: (1)
having R rerun analysis which has already been done and saved into output
database table, and (2) ending up with more than one identical row in
my output database table.

-------------------------------------
The analysis I am running allows the user to input a large number of
variables, for example:
date, version, a, b, c, d, e, f, g, ...

After R completes its analysis, I write the results to a database table in
the format:
Value, date, version, a, b, c, d, e, f, g, ...

where Value is the result of the R analysis, and the rest of the columns are
the criteria that was used to get that value.
--------------------------------------

Can anyone think of a way to address these problems?  The only thing I can
think of so far is to run an sqlQuery to get a table of all the variable
combinations that are saved at the start, and then simply avoid computing
and re-outputing those results.  However, my results database table
currently has over 200K rows (and will grow very quickly as I keep going
with this project), so I think that would not be the most expeditious answer
as I think just the SQL query to download 200K rows x 10+ columns is going
to be time consuming in and of itself.

I know this is kindof a weird problem, and am open to all sorts of ideas...

Thanks!

       [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________




--
Dr. Mark Wardle
Specialist registrar, Neurology
Cardiff, UK

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Avoiding multiple outputs using RODBC package

Reply via email to