On 4/16/21 4:21 PM, Ole Holm Nielsen wrote:
I'm thinking of a reservation something like this:

scontrol create reservation starttime=...  duration=12:00:00 ReservationName=migrate_physics nodes=ALL Accounts=-physics

For the record: The idea of creating a Slurm reservation for excluding specified accounts from running jobs seems to be a viable one. The question is being tracked in https://bugs.schedmd.com/show_bug.cgi?id=11404

The correct way to make such a reservation is actually to add several flags:

$ scontrol create reservation reservationname=exclude_account starttime=13:40:00 duration=30:00 flags=ignore_jobs,magnetic,flex nodes=ALL accounts=-sub1

Caveat: This will result in all Pending jobs getting an incorrect Reason=(ReqNodeNotAvail, Reserved for maintenance). It seems that jobs from other accounts are starting correctly, however, so this does achieve the goal, but probably also causes confusion among users!

SchedMD is looking at a way to enhance a future Slurm version so that the incorrect Reason doesn't appear


On 16/04/2021 14.23, Ole Holm Nielsen wrote:
I need to migrate several sets of user home directories from an old NFS file server to a new NFS file server.  Each group of users belong to specific Slurm accounts organized in a hierarchical tree.

I want to make the migration while the cluster is in full production mode for all the other accounts (the terms "service window" or "downtime" don't exist for me :-)

My idea is to make a Slurm reservation so that the accounts in question will have zero jobs running during the reservation, and I also need to kick users off the login nodes.  During the reservation I can rsync the home directories from the old NFS server to the new NFS server and update the NFS automounter links.

Question:  Does anyone have experiences with this type of scenario? Any good ideas or suggestions for other methods for data migration?

/Ole

Reply via email to