You can also use the influxdb profiling plugin I developed that’s included in the latest slurm version. It will provide live cpu and memory usage per task, step, host and job. You can then provide a grafana dashboard to display the live metrics
Regards, Carlos Sent from my iPhone > On 9 Dec 2018, at 14:39, Aravindh Sampathkumar <aravi...@fastmail.com> wrote: > > Hi All. > > I was wondering if anybody has thought of or hacked around a way to record > CPU and memory consumption of a job during its entire duration and give a > summary of the usage pattern within that job? > Not the MaxRSS and CPU Time that already gets reported for every job. > > I'm thinking more like a chart of CPU utilisation, memory usage, and disk > usage on a per second basis or something like that. > > Asking because some of my users have no clue about the resource consumption > of their jobs, and just blindly ask for way more resources as "safe" option. > It would be a nice way for users to know simple things like - they asked for > 8 cores, but their job ran on just 1 core the entire time because a library > they used is single core limited. > We use Cgroups for process accounting and limiting job's cpu and memory > usage. We also use QoS for limiting resource reservations at user level. > > -- > Aravindh Sampathkumar > aravi...@fastmail.com > >