Thanks so much! Indeed it was a mismatch between the actual and slurmd.conf SocketsPerBoard value. Sushil
On Tue, Oct 11, 2022 at 11:25 AM Paul H. Hargrove <phhargr...@lbl.gov> wrote: > I think Rob is "on the right track" here. Specifically, I don't think the > error message means that "RESUME" is unrecognized as the name of a state. > Rather the message means that a state transition from "INVAL" to "RESUME" > is invalid. I can reproduce that message by trying to "RESUME" an "IDLE" > node, but "RESUME" works fine for node which has been revently rebooted. > > -Paul > > > On Tue, Oct 11, 2022 at 8:14 AM Groner, Rob <rug...@psu.edu> wrote: > >> Have you checked the logs for slurmd and slurmctld? I seem to recall >> that the "invalid" state for a node meant that there was some discrepancy >> between what the node says or thinks it has (slurmd -C) and what the >> slurm.conf says it has. While there is that discrepancy and the node is >> invalid, you can't just tell it to resume. >> >> ------------------------------ >> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of >> Sushil Mishra <sushilbioi...@gmail.com> >> *Sent:* Tuesday, October 11, 2022 10:08 AM >> *To:* Slurm User Community List <slurm-users@lists.schedmd.com> >> *Subject:* [slurm-users] slurm_update error: Invalid node state specified >> >> You don't often get email from sushilbioi...@gmail.com. Learn why this >> is important <https://aka.ms/LearnAboutSenderIdentification> >> Dear all, >> >> I am stuck with scontrol not recognizing the state keywords. I wonder if >> someone can point me to the possible cause of the error. I >> restarted slurmd a few times, and it didn't help. >> >> [sushil@fucose ~]$ sinfo >> PARTITION AVAIL TIMELIMIT NODES STATE NODELIST >> LocalQ* up infinite 1 inval fucose >> >> [sushil@fucose ~]$ sinfo -R >> REASON USER TIMESTAMP NODELIST >> cg sushil 2022-10-10T18:11:27 fucose >> >> [sushil@fucose ~]$ sudo scontrol update NodeName=fucose state=RESUME >> [sudo] password for sushil: >> slurm_update error: Invalid node state specified >> >> [sushil@fucose ~]$ squeue >> JOBID PARTITION NAME USER ST TIME NODES >> NODELIST(REASON) >> >> Best, >> Sushil >> >> > > > -- > Paul H. Hargrove <phhargr...@lbl.gov> > Pronouns: he, him, his > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department > Lawrence Berkeley National Laboratory >