Hi,
On 7/12/2018 6:23 PM, Bjørn-Helge Mevik wrote:
Raymond Wan writes:
However, a more general question... I thought there is no fool-proof
way to watch the amount of memory a job is using. What if within the
script they ran another program using "nohup", for example. Wouldn't
slurm be u
Hi All.
I was wondering if anybody has thought of or hacked around a way to
record CPU and memory consumption of a job during its entire duration
and give a summary of the usage pattern within that job?Not the MaxRSS and CPU
Time that already gets reported for every job.
I'm thinking more like
This is the idea behind XDMod's SUPReMM. It does generate a ton of data
though, so it does not scale to very active systems (i.e. churning over
tens of thousands of jobs).
https://github.com/ubccr/xdmod-supremm
-Paul Edmon-
On 12/9/2018 8:39 AM, Aravindh Sampathkumar wrote:
Hi All.
I was
For the simpler questions (for the overall job step, not real-time), you can
'sacct --format=all’ to get data on completed jobs, and then:
- compare the MaxRSS column to the ReqMem column to see how far off their
memory request was
- compare the TotalCPU column to the product of the NCPUS and El
Hi Aravindh
For our small 3 node cluster I've hacked together a per-node python script that
collects current and peak cpu, memory and scratch disk usage data on all jobs
running on the cluster and builds a fairly simple web-page based on it. It
shouldn't be hard to make it store those data poin
Hi Ken,
Here is my slurm.conf:
ControlMachine=s19r2b08
AuthType=auth/none
CryptoType=crypto/openssl
JobCredentialPrivateKey=/home/bsc33/bsc33882/slurm_over_slurm/etc/slurm.key
JobCredentialPublicCertificate=/home/bsc33/bsc33882/slurm_over_slurm/etc/slurm.cert
MpiDefault=none
ProctrackTyp