Re: [slurm-users] Effect of slurmctld and slurmdb going down on running/pending jobs

2021-06-24 Thread Josef Dvoracek
> I thought setting partitions to DOWN will kill jobs? nn, it just avoids starting new jobs from the job queue in given partition. josef On 24. 06. 21 11:26, Tina Friedrich wrote: I thought setting partitions to DOWN will kill jobs? Amjad - to my experience, the slurmdbd & slurmctld server can

Re: [slurm-users] Effect of slurmctld and slurmdb going down on running/pending jobs

2021-06-24 Thread Tina Friedrich
I thought setting partitions to DOWN will kill jobs? Amjad - to my experience, the slurmdbd & slurmctld server can be rebooted with no effect on running jobs. You can't submit whilst it's down, and I'm not precisely sure what happens to jobs that are just finishing - but really the impact shou

Re: [slurm-users] Effect of slurmctld and slurmdb going down on running/pending jobs

2021-06-24 Thread Josef Dvoracek
hi, just set the partitions to "DOWN" to avoid unexpected behavior for users and reboot slurm(ctl|dbd)+sql box. Running jobs are from my experience not affected. No need to drain nodes. josef On 24. 06. 21 0:54, Amjad Syed wrote: Hello all We have  a cluster  running centos  7 . Our slurm 

Re: [slurm-users] Effect of slurmctld and slurmdb going down on running/pending jobs

2021-06-23 Thread Barbara Krašovec
Just in case, increase Slurmdtimeout in slurm.conf (so that when the controller is back, it will give you time to fix the issues with the communication between slurmd and slurmctld - if there will be any). Otherwise it should not affect running and pending jobs. First stop controller, then slur

[slurm-users] Effect of slurmctld and slurmdb going down on running/pending jobs

2021-06-23 Thread Amjad Syed
Hello all We have a cluster running centos 7 . Our slurm scheduler is running on a vm machine and we are running out of disk space for /var The slurm innodb is taking most of space. We intend to expand the vdisk for slurm server. This will require a reboot for changes to take effect. D