I recently took over running a slurm cluster that among other things allows
users to reboot nodes in order to have them re-configured for different kinds
of tests. This is accomplished through the RebootProgram config setting.
But in the test environment I set up we seem to have lost that capabi
Yeah, looks like if we still want to do this I need to set up slurmdbd and an
account database.
Sent from my iPad
> On Jun 6, 2023, at 5:07 PM, Christopher Samuel wrote:
>
> On 6/6/23 1:33 pm, Heinz, Michael wrote:
>
>> I've gone through the man pages for slur
Hey, all.
So I added slurmdbd to our slurm-23.02 install and made my account an admin,
but when I try to do a srun with --reboot it literally just sits forever, no
errors, nothing in the logs, it just sits with the node in "CF" state until I
cancel the job, set the node to down and back to idle
by the user.
This is usually /sbin/reboot
Brian Andrus
On 6/7/2023 7:50 AM, Heinz, Michael wrote:
Hey, all.
So I added slurmdbd to our slurm-23.02 install and made my account an admin,
but when I try to do a srun with --reboot it literally just sits forever, no
errors, nothing in the logs, it jus
I should also note that scontrol reboot works fine, but srun/salloc/sbatch hang.
Michael Heinz
End-to-End Network Software Engineer
michael.he...@intel.com<mailto:michael.he...@intel.com>
From: Heinz, Michael
Sent: Thursday, June 8, 2023 9:00 AM
To: slurm-users@lists.schedmd.com
Subje