When I need to do something like this I let the automatic SLURM management to do the job. I only shutdown by using SSH, replace something, then power on and everything starts Ok, other option is to call resume in case of any failure, and restart the slurm services in nodes... Regards
*Ing. Gonzalo E. Arroyo - CPA Profesional* IFIMAR - CONICET *www.ifimar-conicet.gob.ar <http://www.ifimar-conicet.gob.ar>* *Este mensaje es confidencial. Puede contener información amparada por el secreto comercial. Si usted ha recibido este e-mail por error, deberá eliminarlo de su sistema. No deberá copiar el mensaje ni divulgar su contenido a ninguna persona. Muchas gracias.* This message is confidential. It may also contain information that is privileged or not authorized to be disclosed. If you have received it by mistake, delete it from your system. You should not copy the messsage nor disclose its contents to anyone. Thanks. El jue., 6 ago. 2020 a las 14:13, Jason Simms (<sim...@lafayette.edu>) escribió: > Hello all, > > Later this month, I will have to bring down, patch, and reboot all nodes > in our cluster for maintenance. The two options available to set nodes into > a maintenance mode seem to be either: 1) creating a system-wide > reservation, or 2) setting all nodes into a DRAIN state. > > I'm not sure it really matters either way, but is there any preference one > way or the other? Any gotchas I should be aware of? > > Warmest regards, > Jason > > -- > *Jason L. Simms, Ph.D., M.P.H.* > Manager of Research and High-Performance Computing > XSEDE Campus Champion > Lafayette College > Information Technology Services > 710 Sullivan Rd | Easton, PA 18042 > Office: 112 Skillman Library > p: (610) 330-5632 >