This is the idea behind XDMod's SUPReMM. It does generate a ton of data
though, so it does not scale to very active systems (i.e. churning over
tens of thousands of jobs).
https://github.com/ubccr/xdmod-supremm
-Paul Edmon-
On 12/9/2018 8:39 AM, Aravindh Sampathkumar wrote:
Hi All.
I was wondering if anybody has thought of or hacked around a way to
record CPU and memory consumption of a job during its entire duration
and give a summary of the usage pattern within that job?
Not the MaxRSS and CPU Time that already gets reported for every job.
I'm thinking more like a chart of CPU utilisation, memory usage, and
disk usage on a per second basis or something like that.
Asking because some of my users have no clue about the resource
consumption of their jobs, and just blindly ask for way more resources
as "safe" option. It would be a nice way for users to know simple
things like - they asked for 8 cores, but their job ran on just 1 core
the entire time because a library they used is single core limited.
We use Cgroups for process accounting and limiting job's cpu and
memory usage. We also use QoS for limiting resource reservations at
user level.
--
Aravindh Sampathkumar
aravi...@fastmail.com