> On 4 Oct 2021, at 18:22, Israel Brewster <ijbrews...@alaska.edu> wrote:
(…) > the script owner is taking about wanting to process and pull in “all the > historical data we have access to”, which would go back several years, not to > mention the probable desire to keep things running into the foreseeable > future. (…) > - The largest SELECT workflow currently is a script that pulls all available > data for ONE channel of each station (currently, I suspect that will change > to all channels in the near future), and runs some post-processing machine > learning algorithms on it. This script (written in R, if that makes a > difference) currently takes around half an hour to run, and is run once every > four hours. I would estimate about 50% of the run time is data retrieval and > the rest doing its own thing. I am only responsible for integrating this > script with the database, what it does with the data (and therefore how long > that takes, as well as what data is needed), is up to my colleague. I have > this script running on the same machine as the DB to minimize data transfer > times. I suspect that a large portion of time is spent on downloading this data to the R script, would it help to rewrite it in PL/R and do (part of) the ML calculations at the DB side? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.