Paul and Chris,

Thanks for the information. This is the first time I had a reason to restart the slurmd processes (instead of just 'scontrol reconfigure) outside of a maintenance window, and wanted to be 100% sure before risk killing all the user jobs on a Friday afternoon.

I'm happy to say the operation was a success.

Prentice

On 07/27/2018 08:47 PM, Paul Edmon wrote:

Restarting slurmd should be fine assuming they come back before the communications time out.  I restart slurmd's all the time and haven't had any real problems.

-Paul Edmon-


On 7/27/2018 6:51 PM, Chris Harwell wrote:
Ot is possible, but double check your config for timeouts first.

On Fri, Jul 27, 2018, 15:31 Prentice Bisbal <pbis...@pppl.gov <mailto:pbis...@pppl.gov>> wrote:

    Slurm-users,

    I'm still learning Slurm, so I have what I think is a basic
    question.
    Can you restart slurmd on nodes where jobs are running, or will that
    kill the jobs? I ran into the same problem as described here:

    https://bugs.schedmd.com/show_bug.cgi?id=3535

    I believe the best way to fix this is to restart slurmd on all my
    nodes,
    but I've been unable to determine conclusively whether I can do
    that w/o
    killing running jobs. I've spent some time googling this, but
    couldn't
    find a definitive answer one way or the other. I prefer to not
    kill a
    bunch of user jobs on a Friday afternoon.

-- Prentice


--
Chris Harwell


Reply via email to