[slurm-users] setting default working directory in prolog

2021-10-26 Thread Stefan Kelber
Hello List,

i am a SLURM newbie and would like to set the job specific working directory 
for it's processing to make use of local disks on compute nodes, as a default, 
using a job prolog (slurmd) therefore, sparing the users to have to take care 
of this in the batch script.

The "SLURM_JOB_WORK_DIR" variable seems to be addressing such an effort.
But apparently it depends on the release of slurm-21.08, i don't have available 
yet.
Is that true?
Since it seems such a crucial feature: how did you have handled similar efforts 
20.11.x or earlier?
Creating directories and moving data works for me, but still the process 
happens on the data in the user's home (submission) directory...
Just a brief idea would help a lot.

Best ragards.



[slurm-users] backfill on overlapping partitions problem

2021-10-26 Thread Andrej Filipcic



Hi,

We have a strange problem with backfilling, there are
large partition "cpu" and overlapping partition "largemem" which is a 
subset of "cpu" nodes.


Now, user A is submitting low priority jobs to "cpu", user B high 
priority jobs to "largemem"
If there are queued jobs in "largemem" (draining nodes there), the 
slurmctld would never backfill the "cpu".  At the extreme, 
non-overlapping "cpu" nodes  would get empty until higher prio jobs get 
all running in "largemem"


Any hint or workaround here? backfill works quite fine if all the jobs 
are submitted to "cpu" partition. User A has typically smaller and 
shorter jobs, good for backfilling.


we use these settings with slurm:
PriorityType=priority/multifactor
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_CORE_MEMORY,CR_CORE_DEFAULT_DIST_BLOCK
SchedulerParameters = 
bf_max_job_test=2000,bf_window=1440,default_queue_depth=1000,bf_continue


Best regards,
Andrej

--
_
   prof. dr. Andrej Filipcic,   E-mail: andrej.filip...@ijs.si
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674Fax: +386-1-425-7074
-




[slurm-users] errors requesting gpus

2021-10-26 Thread Benjamin Nacar
Hi,

I'm setting up a slurm cluster where some subset of compute nodes will have 
gpus. My slurm.conf contains, among other lines:

[...]
GresTypes=gpu
[...]
Include /etc/slurm/slurm.conf.d/allnodes
[...]

and the abovementioned /etc/slurm/slurm.conf.d/allnodes file has the line

NodeName=gpu1601 CPUs=12 RealMemory=257840 Gres=gpu:gtx1080:4

On the host gpu1601, the file /etc/slurm/gres.conf contains

NodeName=gpu1601 Name=gpu Type=gtx1080 File=/dev/nvidia[0-3]

However, when I try to srun something with 1 gpu, I get:

srun: error: gres_plugin_job_state_unpack: no plugin configured to unpack data 
type 7696487 from job 22. This is likely due to a difference in the GresTypes 
configured in slurm.conf on different cluster nodes.
srun: gres_plugin_step_state_unpack: no plugin configured to unpack data type 
7696487 from StepId=22.0
srun: error: fwd_tree_thread: can't find address for host gpu1601, check 
slurm.conf
srun: error: Task launch for StepId=22.0 failed on node gpu1601: Can't find an 
address, check slurm.conf
srun: error: Application launch failed: Can't find an address, check slurm.conf
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: Timed out waiting for job step to complete

I'm not sure whether the relevant error is the "no plugin configured" part or 
the "Can't find an address" part. "gpu1601" is pingable from both the submit 
host and the controller host. The slurm daemons seem to be running without 
errors.

Am I missing something stupidly obvious?

Thanks,
~~ bnacar

-- 
Benjamin Nacar
Systems Programmer
Computer Science Department
Brown University
401.863.7621



Re: [slurm-users] backfill on overlapping partitions problem

2021-10-26 Thread Matt Jay
Hi Andrej,

Take a look at this, and see if it matches up with your issue (I'm not 100% 
sure based on your description):
https://bugs.schedmd.com/show_bug.cgi?id=3881

The takeaway from that is the following (quote from SchedMD): " If there are 
_any_ jobs pending (regardless of the reason for the job still pending) in a 
partition with a higher Priority, no jobs from a lower Priority will be 
launched on nodes that are shared in common."

The above is apparently pretty intrinsic to how Slurm scheduling works, and is 
unlikely to change.

We worked around this by keeping all partitions at the same priority, and using 
QOS instead for priority/preemption -- that has the unfortunate side effect of 
tying down your QOS's to be used for that purpose, but it works for our 
situation.

Best of luck,
-Matt

Matt Jay
Sr. HPC Systems Engineer - Hyak
Research Computing
University of Washington Information Technology

-Original Message-
From: slurm-users  On Behalf Of Andrej 
Filipcic
Sent: Tuesday, October 26, 2021 7:42 AM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] backfill on overlapping partitions problem


Hi,

We have a strange problem with backfilling, there are large partition "cpu" and 
overlapping partition "largemem" which is a subset of "cpu" nodes.

Now, user A is submitting low priority jobs to "cpu", user B high priority jobs 
to "largemem"
If there are queued jobs in "largemem" (draining nodes there), the slurmctld 
would never backfill the "cpu".  At the extreme, non-overlapping "cpu" nodes  
would get empty until higher prio jobs get all running in "largemem"

Any hint or workaround here? backfill works quite fine if all the jobs are 
submitted to "cpu" partition. User A has typically smaller and shorter jobs, 
good for backfilling.

we use these settings with slurm:
PriorityType=priority/multifactor
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_CORE_MEMORY,CR_CORE_DEFAULT_DIST_BLOCK
SchedulerParameters =
bf_max_job_test=2000,bf_window=1440,default_queue_depth=1000,bf_continue

Best regards,
Andrej

--
_
prof. dr. Andrej Filipcic,   E-mail: andrej.filip...@ijs.si
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
SI-1001 Ljubljana, Slovenia
Tel.: +386-1-477-3674Fax: +386-1-425-7074
-




Re: [slurm-users] slurm.conf syntax checker?

2021-10-26 Thread Marcus Wagner

Hi Diego,

sorry for the delay.


On 10/18/21 14:20, Diego Zuccato wrote:

Il 15/10/2021 06:02, Marcus Wagner ha scritto:

mostly, our problem was, that we forgot to add/remove a node to/from 
the partitions/topology file, which caused slurmctld to deny startup. 
So I wrote a simple checker for that. Here is the output of a sample 
run:
Even "just" catching syntax errors and the most common errors is 
already a big help, expecially for noobs :)



[OK]: All nodeweights are correct.

What do you mean with this? How can weights be "incorrect"?


We are using nodeweights calculated out of different factors,  like cpu 
generation, memory, cores and available generic resources.
We have e.g. some nodes with additional NVMe disks, these should be 
scheduled later than the nodes without NVMes, but can be forced for 
scheduling by asking for the constraint nvme.
My checker does calculate these weights, so I do not have to calculate 
these by myself, just insert the calculated value.

Example output (instead of "[OK]: All nodeweights are correct.")
NodeName=lns[07-08] Sockets=8 
CoresPerSocket=18 ThreadsPerCore=1 RealMemory=102 
Feature=broadwell,bwx8860,nvme,hostok,hpcwork Gres=gpu:pascal:1  
Weight=111544(was 1) State=UNKNOWN


So, the correct weight is 111544, but I set it to "1" in the configfile. 
The checker tells me "Weight=111544(was 1)", that the correct value for 
this kind of node would be 111544 and not "1".


Best
Marcus



If someone is interested ...Surely I am :)




--
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de