A reservation overlapping with times you have the node in drain? Drain and reserve:
# scontrol update nodename=node[037] state=drain reason=“testing" # scontrol create reservation users=renfro reservationname='drain_test' nodes=node[037] starttime=2018-10-05T08:17:00 endtime=2018-10-05T09:00:00 Users can’t allocate anything on the drained node (as expected, and hpcshell is just a shell function to srun bash with the usual arguments): [renfro@login ~]$ hpcshell --reservation=drain_test srun: Required node not available (down, drained or reserved) srun: job 135579 queued and waiting for resources ^Csrun: Job allocation 135579 has been revoked srun: Force Terminated job 135579 Resume while reservation is in place: # scontrol update nodename=node[037] state=resume Users in reservation can allocate using the previously-drained node: [renfro@login ~]$ hpcshell --reservation=drain_test [renfro@node037 ~]$ -- Mike Renfro / HPC Systems Administrator, Information Technology Services 931 372-3601 / Tennessee Tech University > On Oct 5, 2018, at 8:06 AM, Michael Di Domenico <mdidomeni...@gmail.com> > wrote: > > Is anyone on the list using maintenance partitions for broken nodes? > If so, how are you moving nodes between partitions? > > The situation with my machines at the moment, is that we have a steady > stream of new jobs coming into the queues, but broken nodes as well. > I'd like to fix those broken nodes and re-add them to a separate > non-production pool so that user jobs don't match, but allow me to run > maintenance jobs on the nodes to prove things are working before > giving them back to the users > > if i simply mark nodes with downnodes= or scontrol update state=drain, > slurm will prevent users from new jobs, but wont allow me to run jobs > on the nodes > > Ideally, i'd like to have a prod partition and a maint partition, > where the maint partition is set to exclusiveuser and i can set the > status of a node in the prod partition to drain without affecting the > node status in the maint partition. I don't believe I can do this > though. I believe i have to change the slurm.conf and reconfigure to > add/remove nodes from one partition or the other > > if anyone has a better solution, i'd like to hear it. >