Re: [slurm-users] Power save doesn't start nodes

2018-07-18 Thread Michael Gutteridge
John: thanks for the link. Curiously, sinfo doesn't show the asterisk, but has it documented. scontrol shows the asterisk and doesn't document it... at least for the state my cluster is in. Antony: Thanks for the steps- I tried it out, but there was no change. It seems like it should do the tri

Re: [slurm-users] Power save doesn't start nodes

2018-07-18 Thread John Hearns
If it is any help, https://slurm.schedmd.com/sinfo.html NODE STATE CODES Node state codes are shortened as required for the field size. These node states may be followed by a special character to identify state flags associated with the node. The following node sufficies and states are used: ***

Re: [slurm-users] Power save doesn't start nodes

2018-07-18 Thread Antony Cleave
I've not seen the IDLE* issue before but when my nodes got stuck I've always beena ble to fix them with this: [root@cloud01 ~]# scontrol update nodename=cloud01 state=down reason=stuck [root@cloud01 ~]# scontrol update nodename=cloud01 state=idle [root@cloud01 ~]# scontrol update nodename=cloud01

[slurm-users] Power save doesn't start nodes

2018-07-17 Thread Michael Gutteridge
Hi I'm running a cluster in a cloud provider and have run up against an odd problem with power save. I've got several hundred nodes that Slurm won't power up even though they appear idle and in the powered-down state. I suspect that they are in a "not-so-idle" state: `scontrol` for all of the no