> On 4 Oct 2021, at 18:22, Israel Brewster <ijbrews...@alaska.edu> wrote:

(…)

> the script owner is taking about wanting to process and pull in “all the 
> historical data we have access to”, which would go back several years, not to 
> mention the probable desire to keep things running into the foreseeable 
> future.

(…)

> - The largest SELECT workflow currently is a script that pulls all available 
> data for ONE channel of each station (currently, I suspect that will change 
> to all channels in the near future), and runs some post-processing machine 
> learning algorithms on it. This script (written in R, if that makes a 
> difference) currently takes around half an hour to run, and is run once every 
> four hours. I would estimate about 50% of the run time is data retrieval and 
> the rest doing its own thing. I am only responsible for integrating this 
> script with the database, what it does with the data (and therefore how long 
> that takes, as well as what data is needed), is up to my colleague. I have 
> this script running on the same machine as the DB to minimize data transfer 
> times.

I suspect that a large portion of time is spent on downloading this data to the 
R script, would it help to rewrite it in PL/R and do (part of) the ML 
calculations at the DB side?

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Reply via email to