Hi,
There were two cases where this happened to us as well:
1. The systemd slurmd.service wasn't configured properly, and so the jobs
ran under the slurmd.slice. So by restarting slurmd, systemd will send a
signal to all processes. You can check if this is the case with 'systemctl
status slurmd.se
I’m trying to replicate the setup of a new account where there is a new
“grouping” of accounts and a new account that will actually be used, so
something like this when you run
sacctmgr show assoc tree
mycluster account1. (which is just being used to group accounts
and so has no GrpTRE
Hello all,
I did some modification in my slurm.conf and I’ve restarted the slurmctld on
the master and then the slurmd on the nodes.
During this process I’ve lost some jobs (*), curiously all these jobs were on
ubuntu nodes .
These jobs were ok with the consumed resources (**).
Any Idea what co
Hi!
I need your help
How could I use chekpoint (dmtcp) with slurm?
Thanks in advance
Angelines