Thanks to both of you for your replies. I did the move this morning, and
it went off without a hitch. It does appear that the job state directory
keeps track of the queue data, because as soon as I copied those dirs
over, I was able to see the queue information on the new Slurm controller.
I had done this operation once before, but it was a couple years ago, so
I just wanted to be safe rather than sorry. Thanks for the help.
On 1/16/21 1:43 PM, Michael Gutteridge wrote:
I'd confirm that as well. The state directory has all of that
information. We just upgraded from 18.05 to 20.02 on a different host
and while the cluster was quiet (we had a maintenance reservation in
place) there were running jobs which survived the upgrade.
I think the big thing to watch out for is setting the slurmdtimeout in
your config prior to the update. Might not be necessary depending on
the exact steps you're using, but it's useful insurance against job loss.
- Michael
On Fri, Jan 15, 2021 at 7:51 PM Ryan Novosielski <
<>> wrote:
My understanding is job state directory. Theoretically if you back
it up, screw up and lose it, you can restore it and try again.
There’s some mention of this in the upgrade docs if I’m not
mistaken (as they suggest backing it up in case you mess up during).
|| \\UTGERS,
||_// the State | Ryan Novosielski - <>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
RBHS Campus
|| \\ of NJ | Office of Advanced Research Computing - MSB
C630, Newark
On Jan 15, 2021, at 13:44, Prentice Bisbal <
<>> wrote:
Slurm users,
I'm planning on moving slurmctld and slurmdbd to a new host. I
know how to dump the MySQL DB from the old server and import it
to the new slurmdbd host, and I know how to copy the job state
directories to the new host. I plan on doing this during our next
maintenance window when there are no jobs running on the cluster.
However, there will be plenty of jobs in the queue, so my
question is this: What will happen to jobs in the queue when I do
this? Is the queue information stored in the database or the job
state directories, or a third location? How can I make sure I
don't lose the state of the queue?
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory