On Sun, 14 Dec 2008, Mark Wardle wrote:
Hi.
You don't need to download the whole of the output database table to
look for an already generated answer. You can write a SQL query to do
that instead. ie. give me any rows with these parameters... Get the
database to do the work - it is what they are designed to do.
So the procedure is:
1. Get input parameters
2. Query the output database to see whether analysis has already been
done (select * from output_table where...)
3. If not already done, do the calculation and insert result into output table
Note: you don't have to use sqlSave to save data. One can add single
rows by running arbitrary SQL.
Or sqlUpdate.
Mark
2008/12/12 Brigid Mooney <bkmoo...@gmail.com>:
I am using R as a data manipulation tool for a SQL database. So in some of
my R scripts I use the RODBC package to retreive data, then run analysis,
and use the sqlSave function in the RODBC package to store the results in a
database.
There are two problems I want to avoid, and they are highly related: (1)
having R rerun analysis which has already been done and saved into output
database table, and (2) ending up with more than one identical row in
my output database table.
-------------------------------------
The analysis I am running allows the user to input a large number of
variables, for example:
date, version, a, b, c, d, e, f, g, ...
After R completes its analysis, I write the results to a database table in
the format:
Value, date, version, a, b, c, d, e, f, g, ...
where Value is the result of the R analysis, and the rest of the columns are
the criteria that was used to get that value.
--------------------------------------
Can anyone think of a way to address these problems? The only thing I can
think of so far is to run an sqlQuery to get a table of all the variable
combinations that are saved at the start, and then simply avoid computing
and re-outputing those results. However, my results database table
currently has over 200K rows (and will grow very quickly as I keep going
with this project), so I think that would not be the most expeditious answer
as I think just the SQL query to download 200K rows x 10+ columns is going
to be time consuming in and of itself.
I know this is kindof a weird problem, and am open to all sorts of ideas...
Thanks!
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________
--
Dr. Mark Wardle
Specialist registrar, Neurology
Cardiff, UK
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.