This is the idea behind XDMod's SUPReMM.  It does generate a ton of data though, so it does not scale to very active systems (i.e. churning over tens of thousands of jobs).

https://github.com/ubccr/xdmod-supremm

-Paul Edmon-


On 12/9/2018 8:39 AM, Aravindh Sampathkumar wrote:
Hi All.

I was wondering if anybody has thought of or hacked around a way to record CPU and memory consumption of a job during its entire duration and give a summary of the usage pattern within that job?
Not the MaxRSS and CPU Time that already gets reported for every job.

I'm thinking more like a chart of CPU utilisation, memory usage, and disk usage on a per second basis or something like that.

Asking because some of my users have no clue about the resource consumption of their jobs, and just blindly ask for way more resources as "safe" option. It would be a nice way for users to know simple things like - they asked for 8 cores, but their job ran on just 1 core the entire time because a library they used is single core limited. We use Cgroups for process accounting and limiting job's cpu and memory usage. We also use QoS for limiting resource reservations at user level.

--
  Aravindh Sampathkumar
  aravi...@fastmail.com



Reply via email to