Perhaps I have confused the issue. When I initally said "data points" I
meant one stand alone analysis, not one piece of data. Each analysis
point takes 1.5 seconds. I have not implemented running this over the
whole dataset yet, but I would expect it to take about 5 to 10 hours.
This is just about acceptable, but it would be better if this was
quicker. As I say, the exact analysis method has not yet been
determined, and if that was significantly more computationally intensive
then that could be an issue.
It is not actually a simulation, it is a pre-analysis of the dataset
before public display. I do have a simulation of the analysis to run,
and that could be some orders of magnitude larger than the real
dataset. I can of course wait for that.
Thanks for the input.
On 05/08/2012 05:24 PM, Bert Gunter wrote:
Probably just pointing out the obvious, but:
200,000 data points may not be that many these days, depending on the
dimensionality of the data. Nor is 10 times that number, neither now
nor in 5 years, again depending on data dimensionality. So my question
is, have you actually tried running your simulations -- or a
reasonable approximation thereof -- on a single "cheap" machine? It
might be that your concerns are overblown, especially with multicore
and parallelization.
Obviously, ignore if you've already done this and know it's nonsense.
Cheers,
Bert
On Tue, May 8, 2012 at 8:50 AM, Hugh Morgan<h.mor...@har.mrc.ac.uk> wrote:
On 05/08/2012 12:14 PM, Zhou Fang wrote:
How many data points do you have?
Currently 200,000. We are likely to have 10 times that in 5 years.
Why buy when you can rent? Unless your hardware is going to be
running 24/7 doing these analyses then you are paying for it to sit
idle. You might be better off purchasing computing time from Amazon or
another cloud computing provider. If you need to run more analyses
quickly, just buy some more virtual hosts.
Because of the nature of the funding we are likely to be better off buying.
We are likely to be running most of the time, most of the analysis must be
rerun as more data becomes available, and that is likely to happen a few
times every week.
Thank you for all the pointers, we shall consider them all.
This email may have a PROTECTIVE MARKING, for an explanation please see:
http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
This email may have a PROTECTIVE MARKING, for an explanation please see:
http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.