Hi Jin,

I think that I always do your steps 3,4 in the opposite order: Restart slurmctld, then slurmd on nodes:

> 3. Restart the slurmd on all nodes
> 4. Restart the slurmctld

Since you run a very old Slurm 15.08, perhaps you should upgrade 15.08 -> 16.05 -> 17.02. Soon there will be a 17.11. FYI: I wrote some notes about upgrading: https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm

/Ole



On 10/23/2017 02:55 PM, JinSung Kang wrote:
Hi

Thanks everyone for your response. I have also tested my setup to remove nodes from the cluster, and the same thing happens.

*To answer some of the previous questions.*
"Node compute004 appears to have a different slurm.conf than the slurmctld" error comes up when I replace slurm.conf in all the devices, but it goes away when I restart slurmctld.

slurm version that I'm running is slurm 15.08.7

I've included the slurm.conf rather than slurmdbd.conf.

Cheers,

Jin


On Mon, Oct 23, 2017 at 8:25 AM Ole Holm Nielsen <[email protected] <mailto:Ole.H.Nhttps://wiki.fysik.dtu.dk/niflheim/Slurm_installation#[email protected]>> wrote:


    Hi Jin,

    Your slurmctld.log says "Node compute004 appears to have a different
    slurm.conf than the slurmctld" etc.  This will happen if you didn't copy
    correctly the slurm.conf to the nodes.  Please correct this
    potential error.

    Also, please specify which version of Slurm you're running.

    /Ole

    On 10/22/2017 08:44 PM, JinSung Kang wrote:
     > I am having trouble with adding new nodes into slurm cluster without
     > killing the jobs that are currently running.
     >
     > Right now I
     >
     > 1. Update the slurm.conf and add a new node to it
     > 2. Copy new slurm.conf to all the nodes,
     > 3. Restart the slurmd on all nodes
     > 4. Restart the slurmctld
     >
     > But when I restart slurmctld all the jobs that were currently running
     > are requeued (Begin Time) as reason for not running. The new
    added node
     > works perfectly fine.
     >
     > I've included the slurm.conf. I've also included slurmctld.log output
     > when I'm trying to add the new node.

Reply via email to