Hello, I am having trouble with adding new nodes into slurm cluster without killing the jobs that are currently running.
Right now I 1. Update the slurm.conf and add a new node to it 2. Copy new slurm.conf to all the nodes, 3. Restart the slurmd on all nodes 4. Restart the slurmctld But when I restart slurmctld all the jobs that were currently running are requeued (Begin Time) as reason for not running. The new added node works perfectly fine. I've included the slurm.conf. I've also included slurmctld.log output when I'm trying to add the new node. Cheers, Jin
slurmctld.log
Description: Binary data
slurmdbd.conf
Description: Binary data