On 4/16/21 4:21 PM, Ole Holm Nielsen wrote:
I'm thinking of a reservation something like this:
scontrol create reservation starttime=... duration=12:00:00
ReservationName=migrate_physics nodes=ALL Accounts=-physics
For the record: The idea of creating a Slurm reservation for excluding
specified accounts from running jobs seems to be a viable one. The
question is being tracked in https://bugs.schedmd.com/show_bug.cgi?id=11404
The correct way to make such a reservation is actually to add several flags:
$ scontrol create reservation reservationname=exclude_account
starttime=13:40:00 duration=30:00 flags=ignore_jobs,magnetic,flex
nodes=ALL accounts=-sub1
Caveat: This will result in all Pending jobs getting an incorrect
Reason=(ReqNodeNotAvail, Reserved for maintenance). It seems that jobs
from other accounts are starting correctly, however, so this does achieve
the goal, but probably also causes confusion among users!
SchedMD is looking at a way to enhance a future Slurm version so that the
incorrect Reason doesn't appear
On 16/04/2021 14.23, Ole Holm Nielsen wrote:
I need to migrate several sets of user home directories from an old NFS
file server to a new NFS file server. Each group of users belong to
specific Slurm accounts organized in a hierarchical tree.
I want to make the migration while the cluster is in full production
mode for all the other accounts (the terms "service window" or
"downtime" don't exist for me :-)
My idea is to make a Slurm reservation so that the accounts in question
will have zero jobs running during the reservation, and I also need to
kick users off the login nodes. During the reservation I can rsync the
home directories from the old NFS server to the new NFS server and
update the NFS automounter links.
Question: Does anyone have experiences with this type of scenario? Any
good ideas or suggestions for other methods for data migration?
/Ole