Hi Rafał, you may try setting `ReturnToService=2´ in slurm.conf.
Best regards Jürgen -- Jürgen Salk Scientific Software & Compute Services (SSCS) Kommunikations- und Informationszentrum (kiz) Universität Ulm Telefon: +49 (0)731 50-22478 Telefax: +49 (0)731 50-22471 * Rafał Kędziorski <rafal.kedzior...@gmail.com> [190927 10:36]: > Hi Andreas, > > my Cluster is not running whole time. I call just sudo shutdown. And after > boot the nodes are in state down. > > I'm using Slurn on Raspi Cluster (5* Pi 4). What is the best way to > shutdown the nodes that after boot the nodes are idle and not down? > > > Regards, > Rafal > > Am Fr., 27. Sept. 2019 um 08:43 Uhr schrieb Henkel, Andreas < > hen...@uni-mainz.de>: > > > Hi Rafal, > > > > How do you restart the nodes? If you don’t use scontrol reboot <node> > > Slurm doesn’t expect nodes to reboot therefore you see that reason in those > > cases. > > > > Best > > Andreas > > > > Am 27.09.2019 um 07:53 schrieb Rafał Kędziorski < > > rafal.kedzior...@gmail.com>: > > > > Hi, > > > > I'm working with slurm-wlm 18.08.5-2 on Raspberry Pi Cluster: > > > > - 1 Pi 4 as manager > > - 4 Pi 4 nodes > > > > This work fine. But after every restart of the nodes I get this > > > > cluster@pi-manager:~ $ sinfo > > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > > devcluster* up infinite 4 down pi-4-node-[1-4] > > > > state. Than I can call > > > > sudo scontrol update NodeName=<node_name> State=RESUME > > > > for every node and sometimes are all nodes idle and some down > > > > cluster @pi-manager:~ $ sinfo > > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > > devcluster* up infinite 2 idle pi-4-node-[1-2] > > devcluster* up infinite 2 down pi-4-node-[3-4] > > > > Status to all nodes > > > > cluster@pi-manager:~ $ scontrol show nodes > > NodeName=pi-4-node-1 Arch=armv7l CoresPerSocket=1 > > CPUAlloc=0 CPUTot=4 CPULoad=0.24 > > AvailableFeatures=(null) > > ActiveFeatures=(null) > > Gres=(null) > > NodeAddr=192.168.178.141 NodeHostName=pi-4-node-1 Version=18.08 > > OS=Linux 4.19.66-v7l+ #1253 SMP Thu Aug 15 12:02:08 BST 2019 > > RealMemory=1 AllocMem=0 FreeMem=3687 Sockets=4 Boards=1 > > State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A > > Partitions=devcluster > > BootTime=2019-09-19T17:38:58 SlurmdStartTime=2019-09-19T00:26:36 > > CfgTRES=cpu=4,mem=1M,billing=4 > > AllocTRES= > > CapWatts=n/a > > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > > > > > NodeName=pi-4-node-2 Arch=armv7l CoresPerSocket=1 > > CPUAlloc=0 CPUTot=4 CPULoad=0.06 > > AvailableFeatures=(null) > > ActiveFeatures=(null) > > Gres=(null) > > NodeAddr=192.168.178.142 NodeHostName=pi-4-node-2 Version=18.08 > > OS=Linux 4.19.66-v7l+ #1253 SMP Thu Aug 15 12:02:08 BST 2019 > > RealMemory=1 AllocMem=0 FreeMem=3687 Sockets=4 Boards=1 > > State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A > > Partitions=devcluster > > BootTime=2019-09-19T17:38:57 SlurmdStartTime=2019-09-19T00:26:49 > > CfgTRES=cpu=4,mem=1M,billing=4 > > AllocTRES= > > CapWatts=n/a > > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > > > > > NodeName=pi-4-node-3 Arch=armv7l CoresPerSocket=1 > > CPUAlloc=0 CPUTot=4 CPULoad=0.02 > > AvailableFeatures=(null) > > ActiveFeatures=(null) > > Gres=(null) > > NodeAddr=192.168.178.143 NodeHostName=pi-4-node-3 Version=18.08 > > OS=Linux 4.19.66-v7l+ #1253 SMP Thu Aug 15 12:02:08 BST 2019 > > RealMemory=1 AllocMem=0 FreeMem=3676 Sockets=4 Boards=1 > > State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A > > Partitions=devcluster > > BootTime=2019-09-19T17:38:55 SlurmdStartTime=2019-09-19T00:26:45 > > CfgTRES=cpu=4,mem=1M,billing=4 > > AllocTRES= > > CapWatts=n/a > > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > Reason=Node unexpectedly rebooted [slurm@2019-09-19T17:39:32] > > > > NodeName=pi-4-node-4 Arch=armv7l CoresPerSocket=1 > > CPUAlloc=0 CPUTot=4 CPULoad=0.02 > > AvailableFeatures=(null) > > ActiveFeatures=(null) > > Gres=(null) > > NodeAddr=192.168.178.144 NodeHostName=pi-4-node-4 Version=18.08 > > OS=Linux 4.19.66-v7l+ #1253 SMP Thu Aug 15 12:02:08 BST 2019 > > RealMemory=1 AllocMem=0 FreeMem=3687 Sockets=4 Boards=1 > > State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A > > Partitions=devcluster > > BootTime=2019-09-19T17:38:52 SlurmdStartTime=2019-09-19T00:26:47 > > CfgTRES=cpu=4,mem=1M,billing=4 > > AllocTRES= > > CapWatts=n/a > > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > Reason=Node unexpectedly rebooted [slurm@2019-09-19T17:39:30] > > > > NodeName=pi-manager Arch=armv7l CoresPerSocket=1 > > CPUAlloc=0 CPUTot=4 CPULoad=0.00 > > AvailableFeatures=(null) > > ActiveFeatures=(null) > > Gres=(null) > > NodeAddr=192.168.178.140 NodeHostName=pi-manager Version=18.08 > > OS=Linux 4.19.66-v7l+ #1253 SMP Thu Aug 15 12:02:08 BST 2019 > > RealMemory=1 AllocMem=0 FreeMem=3446 Sockets=4 Boards=1 > > State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A > > BootTime=2019-09-19T17:35:48 SlurmdStartTime=2019-09-19T08:10:51 > > CfgTRES=cpu=4,mem=1M,billing=4 > > AllocTRES= > > CapWatts=n/a > > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > > > Nodes which are down, the Reason is: > > > > Reason=Node unexpectedly rebooted [slurm@2019-09-19T17:39:30] > > > > What is the problem? But my Nodes in the Cluster are not running whole > > time. > > > > > > > > Regards, > > Rafal > > > > -- GPG A997BA7A | 87FC DA31 5F00 C885 0DC3 E28F BD0D 4B33 A997 BA7A