[slurm-users] Staging data on the nodes one will be processing on via sbatch

Will Dennis Sat, 03 Apr 2021 12:44:25 -0700

Hi all,

We have various NFS servers that contain the data that our researchers want to 
process. These are mounted on our Slurm clusters on well-known paths. Also, the 
nodes have local fast scratch disk on another well-known path. We do not have 
any distributed file systems in use (Our Slurm clusters are basically just 
collections of hetero nodes of differing types, not a traditional HPC setup by 
any means.)


In most cases, the researchers can process the data directly off the NFS mounts 
without it causing any issues, but in some cases, this slows down the 
computation unacceptably. They could manually copy the data to the local drive 
using an allocation & srun commands, but I am wondering if there is a way to do 
this in sbatch?

I tried this method:

wdennis@submit01 ~> sbatch transfer.sbatch
Submitted batch job 329572
wdennis@submit01 ~> sbatch --dependency=afterok:329572 test_job.sbatch
Submitted batch job 329573
wdennis@submit01 ~>  sbatch --dependency=afterok:329573 rm_data.sbatch
Submitted batch job 329574
wdennis@submit01 ~>
wdennis@submit01 ~> squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
            329573       gpu wdennis_  wdennis PD       0:00      1 (Dependency)
            329574       gpu wdennis_  wdennis PD       0:00      1 (Dependency)
            329572       gpu wdennis_  wdennis  R       0:23      1 
compute-gpu02

But it seems to not preserve the node allocated with the --dependency jobs:

JobID|JobName|User|Partition|NodeList|AllocCPUS|ReqMem|CPUTime|QOS|State|ExitCode|AllocTRES|
329572|wdennis_data_transfer|wdennis|gpu|compute-gpu02|1|2Gc|00:02:01|normal|COMPLETED|0:0|cpu=1,mem=2G,node=1|
329573|wdennis_compute_job|wdennis|gpu|compute-gpu05|1|128Gn|00:03:00|normal|COMPLETED|0:0|cpu=1,mem=128G,node=1,gres/gpu=1|
329574|wdennis_data_removal|wdennis|gpu|compute-gpu02|1|2Gc|00:00:01|normal|COMPLETED|0:0|cpu=1,mem=2G,node=1|

What is the best way to do something like “stage the data on a local path / run 
computation using the local copy / remove the locally staged data when 
complete”?

Thanks!
Will

[slurm-users] Staging data on the nodes one will be processing on via sbatch

Reply via email to