For the simpler questions (for the overall job step, not real-time), you can 'sacct --format=all’ to get data on completed jobs, and then:
- compare the MaxRSS column to the ReqMem column to see how far off their memory request was - compare the TotalCPU column to the product of the NCPUS and ElapsedRaw to see how far off their core request was > On Dec 9, 2018, at 7:39 AM, Aravindh Sampathkumar <aravi...@fastmail.com> > wrote: > > Hi All. > > I was wondering if anybody has thought of or hacked around a way to record > CPU and memory consumption of a job during its entire duration and give a > summary of the usage pattern within that job? > Not the MaxRSS and CPU Time that already gets reported for every job. > > I'm thinking more like a chart of CPU utilisation, memory usage, and disk > usage on a per second basis or something like that. > > Asking because some of my users have no clue about the resource consumption > of their jobs, and just blindly ask for way more resources as "safe" option. > It would be a nice way for users to know simple things like - they asked for > 8 cores, but their job ran on just 1 core the entire time because a library > they used is single core limited. > We use Cgroups for process accounting and limiting job's cpu and memory > usage. We also use QoS for limiting resource reservations at user level. > > -- > Aravindh Sampathkumar > aravi...@fastmail.com