[slurm-users] setting default working directory in prolog
Hello List, i am a SLURM newbie and would like to set the job specific working directory for it's processing to make use of local disks on compute nodes, as a default, using a job prolog (slurmd) therefore, sparing the users to have to take care of this in the batch script. The "SLURM_JOB_WORK_DIR" variable seems to be addressing such an effort. But apparently it depends on the release of slurm-21.08, i don't have available yet. Is that true? Since it seems such a crucial feature: how did you have handled similar efforts 20.11.x or earlier? Creating directories and moving data works for me, but still the process happens on the data in the user's home (submission) directory... Just a brief idea would help a lot. Best ragards.
[slurm-users] backfill on overlapping partitions problem
Hi, We have a strange problem with backfilling, there are large partition "cpu" and overlapping partition "largemem" which is a subset of "cpu" nodes. Now, user A is submitting low priority jobs to "cpu", user B high priority jobs to "largemem" If there are queued jobs in "largemem" (draining nodes there), the slurmctld would never backfill the "cpu". At the extreme, non-overlapping "cpu" nodes would get empty until higher prio jobs get all running in "largemem" Any hint or workaround here? backfill works quite fine if all the jobs are submitted to "cpu" partition. User A has typically smaller and shorter jobs, good for backfilling. we use these settings with slurm: PriorityType=priority/multifactor SchedulerType=sched/backfill SelectType=select/cons_tres SelectTypeParameters=CR_CORE_MEMORY,CR_CORE_DEFAULT_DIST_BLOCK SchedulerParameters = bf_max_job_test=2000,bf_window=1440,default_queue_depth=1000,bf_continue Best regards, Andrej -- _ prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674Fax: +386-1-425-7074 -
[slurm-users] errors requesting gpus
Hi, I'm setting up a slurm cluster where some subset of compute nodes will have gpus. My slurm.conf contains, among other lines: [...] GresTypes=gpu [...] Include /etc/slurm/slurm.conf.d/allnodes [...] and the abovementioned /etc/slurm/slurm.conf.d/allnodes file has the line NodeName=gpu1601 CPUs=12 RealMemory=257840 Gres=gpu:gtx1080:4 On the host gpu1601, the file /etc/slurm/gres.conf contains NodeName=gpu1601 Name=gpu Type=gtx1080 File=/dev/nvidia[0-3] However, when I try to srun something with 1 gpu, I get: srun: error: gres_plugin_job_state_unpack: no plugin configured to unpack data type 7696487 from job 22. This is likely due to a difference in the GresTypes configured in slurm.conf on different cluster nodes. srun: gres_plugin_step_state_unpack: no plugin configured to unpack data type 7696487 from StepId=22.0 srun: error: fwd_tree_thread: can't find address for host gpu1601, check slurm.conf srun: error: Task launch for StepId=22.0 failed on node gpu1601: Can't find an address, check slurm.conf srun: error: Application launch failed: Can't find an address, check slurm.conf srun: Job step aborted: Waiting up to 32 seconds for job step to finish. srun: error: Timed out waiting for job step to complete I'm not sure whether the relevant error is the "no plugin configured" part or the "Can't find an address" part. "gpu1601" is pingable from both the submit host and the controller host. The slurm daemons seem to be running without errors. Am I missing something stupidly obvious? Thanks, ~~ bnacar -- Benjamin Nacar Systems Programmer Computer Science Department Brown University 401.863.7621
Re: [slurm-users] backfill on overlapping partitions problem
Hi Andrej, Take a look at this, and see if it matches up with your issue (I'm not 100% sure based on your description): https://bugs.schedmd.com/show_bug.cgi?id=3881 The takeaway from that is the following (quote from SchedMD): " If there are _any_ jobs pending (regardless of the reason for the job still pending) in a partition with a higher Priority, no jobs from a lower Priority will be launched on nodes that are shared in common." The above is apparently pretty intrinsic to how Slurm scheduling works, and is unlikely to change. We worked around this by keeping all partitions at the same priority, and using QOS instead for priority/preemption -- that has the unfortunate side effect of tying down your QOS's to be used for that purpose, but it works for our situation. Best of luck, -Matt Matt Jay Sr. HPC Systems Engineer - Hyak Research Computing University of Washington Information Technology -Original Message- From: slurm-users On Behalf Of Andrej Filipcic Sent: Tuesday, October 26, 2021 7:42 AM To: slurm-users@lists.schedmd.com Subject: [slurm-users] backfill on overlapping partitions problem Hi, We have a strange problem with backfilling, there are large partition "cpu" and overlapping partition "largemem" which is a subset of "cpu" nodes. Now, user A is submitting low priority jobs to "cpu", user B high priority jobs to "largemem" If there are queued jobs in "largemem" (draining nodes there), the slurmctld would never backfill the "cpu". At the extreme, non-overlapping "cpu" nodes would get empty until higher prio jobs get all running in "largemem" Any hint or workaround here? backfill works quite fine if all the jobs are submitted to "cpu" partition. User A has typically smaller and shorter jobs, good for backfilling. we use these settings with slurm: PriorityType=priority/multifactor SchedulerType=sched/backfill SelectType=select/cons_tres SelectTypeParameters=CR_CORE_MEMORY,CR_CORE_DEFAULT_DIST_BLOCK SchedulerParameters = bf_max_job_test=2000,bf_window=1440,default_queue_depth=1000,bf_continue Best regards, Andrej -- _ prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674Fax: +386-1-425-7074 -
Re: [slurm-users] slurm.conf syntax checker?
Hi Diego, sorry for the delay. On 10/18/21 14:20, Diego Zuccato wrote: Il 15/10/2021 06:02, Marcus Wagner ha scritto: mostly, our problem was, that we forgot to add/remove a node to/from the partitions/topology file, which caused slurmctld to deny startup. So I wrote a simple checker for that. Here is the output of a sample run: Even "just" catching syntax errors and the most common errors is already a big help, expecially for noobs :) [OK]: All nodeweights are correct. What do you mean with this? How can weights be "incorrect"? We are using nodeweights calculated out of different factors, like cpu generation, memory, cores and available generic resources. We have e.g. some nodes with additional NVMe disks, these should be scheduled later than the nodes without NVMes, but can be forced for scheduling by asking for the constraint nvme. My checker does calculate these weights, so I do not have to calculate these by myself, just insert the calculated value. Example output (instead of "[OK]: All nodeweights are correct.") NodeName=lns[07-08] Sockets=8 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=102 Feature=broadwell,bwx8860,nvme,hostok,hpcwork Gres=gpu:pascal:1 Weight=111544(was 1) State=UNKNOWN So, the correct weight is 111544, but I set it to "1" in the configfile. The checker tells me "Weight=111544(was 1)", that the correct value for this kind of node would be 111544 and not "1". Best Marcus If someone is interested ...Surely I am :) -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de