Re: [slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-30 Thread Felix Wolfheimer
After a bit more testing I can answer my original question: I was just too impatient. When the ResumeProgram comes back with an exit code != 0 SLURM doesn't taint the node, i.e., it tries to start it again after a while. Exactly what I want! :-) @Lachlan Musicman: My slurm.conf Node and Partition

Re: [slurm-users] restart slurmd on nodes w/ running jobs?

2018-07-30 Thread Prentice Bisbal
Paul and Chris, Thanks for the information. This is the first time I had a reason to restart the slurmd processes (instead of just 'scontrol reconfigure) outside of a maintenance window, and wanted to be 100% sure before risk killing all the user jobs on a Friday afternoon. I'm happy to say