FYI: My Slurm tools for displaying batch job user process information have been updated. Besides the user process list from "ps", a summary of the number of processes and threads is now printed as well. We use this for monitoring the sanity of user jobs. For example, we often see jobs that run too many threads per process and overload the CPUs.

The tools are:

* psjob <jobid>      for all user processes in a job
* psnode <nodelist>  for all user processes on a node or list of nodes

Download the psjob and psnode tools from:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/nodes

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark

Reply via email to