Re: [slurm-users] SLES 15 rpmbuild from 20.02.5 tarball wants munge-libs: system munge RPMs don't provide it

2020-10-21 Thread Kevin Buckley
On 2020/10/21 13:11, Christopher Samuel wrote: I guess the question is (going back to your initial post): > error: Failed build dependencies: >munge-libs is needed by slurm-20.02.5-1.x86_64 Had you installed libmunge2 before trying this build? rpmbuild can't install it for you if

[slurm-users] Slurm not enforcing gres requests at submission

2020-10-21 Thread Jason Macklin
Good day everyone! We have a GPU-only cluster with Slurm 19.05.5 installed. My expectation with respect to the current configuration is that users submitting jobs must include at least one of the following header options: --gres --gpus --gpus-per-node --gpus-per-socket --gpus-per-task The

Re: [slurm-users] Use gres to handle permissions of /dev/dri/card* and /dev/dri/renderD*?

2020-10-21 Thread Daniel Letai
Just a quick addendum - rsmi_dev_drm_render_minor_get used in the plugin references the ROCM-SMI lib from https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/2e8dc4f2a91bfa7661f4ea289736b12153ce23c2/src/rocm_smi.cc#L1689 So the library (as an .so file) should be installe

Re: [slurm-users] Use gres to handle permissions of /dev/dri/card* and /dev/dri/renderD*?

2020-10-21 Thread Daniel Letai
Take a look at https://github.com/SchedMD/slurm/search?q=dri%2F If the ROCM-SMI API is present, using AutoDetect=rsmi in gres.conf might be enough, if I'm reading this right. Of course, this assumes the cards in question are AMD and not NVIDIA.

[slurm-users] Partition QOS limit not being enforced

2020-10-21 Thread Durai Arasan
Hello, We recently created a new partition with the following slurm.conf and QOS settings: *cat /etc/slurm/slurm.conf | grep part-long* *PartitionName=part-long Nodes=node-1,node-2,node-3 Default=YES, AllowAccounts=group1,group2 TRESBillingWeights="gres/gpu=22" MaxNodes=1 MaxTime=10-0 QOS=long-10

Re: [slurm-users] Array jobs vs Fairshare

2020-10-21 Thread Riebs, Andy
Thanks for the additional information, Stephan! At this point, I’ll have to ask for anyone with more job array experience than I have (because I have none!) to speak up. Remember that we’re all in this together(*), so any help that anyone can offer will be good! Andy (*) Well, actually, I’m r

Re: [slurm-users] Array jobs vs Fairshare

2020-10-21 Thread Bernd Melchers
>Hi everyone, >I am having doubts regarding array jobs. To me it seems that the >JobArrayTaskLimit has precedence over the Fairshare, as users with a >way lower priority seem to get constant allocations for their array >jobs, compared to users with "normal" jobs. Can someone con

Re: [slurm-users] Array jobs vs Fairshare

2020-10-21 Thread Stephan Schott
And I forgot to mention, things are running in a Qlustar cluster based on Ubuntu 18.04.4 LTS Bionic. 😬 El mié., 21 oct. 2020 a las 15:38, Stephan Schott () escribió: > Oh, sure, sorry. > We are using slurm 18.08.8, with a backfill scheduler. The jobs are being > assigned to the same partition, wh

Re: [slurm-users] Array jobs vs Fairshare

2020-10-21 Thread Stephan Schott
Oh, sure, sorry. We are using slurm 18.08.8, with a backfill scheduler. The jobs are being assigned to the same partition, which limits gpus and cpus to 1 via QOS. Here some of the main flags: SallocDefaultCommand="srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 --pty --preserve-env --mpi=none $SHELL" T

Re: [slurm-users] Array jobs vs Fairshare

2020-10-21 Thread Riebs, Andy
Also, of course, any of the information that you can provide about how the system is configured: scheduler choices, QOS options, and the like, would also help in answering your question. From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Riebs, Andy Sent: Wednesday, O

Re: [slurm-users] Array jobs vs Fairshare

2020-10-21 Thread Riebs, Andy
Stephan (et al.), There are probably 6 versions of Slurm in common use today, across multiple versions each of Debian/Ubuntu, SuSE/SLES, and RedHat/CentOS/Fedora. You are more likely to get a good answer if you offer some hints about what you are running! Regards, Andy From: slurm-users [mail

Re: [slurm-users] Reserve some cores per GPU

2020-10-21 Thread Stephan Schott
This is related to this other thread: https://groups.google.com/g/slurm-users/c/88pZ400whu0/m/9FYFqKh6AQAJ AFAIK, the only rudimentary solution is the MaxCPUsPerNode partition flag, and setting independent gpu and cpu partitions, but having something like "CpusReservedPerGpu" would be nice. @Aaron

[slurm-users] Array jobs vs Fairshare

2020-10-21 Thread Stephan Schott
Hi everyone, I am having doubts regarding array jobs. To me it seems that the JobArrayTaskLimit has precedence over the Fairshare, as users with a way lower priority seem to get constant allocations for their array jobs, compared to users with "normal" jobs. Can someone confirm this? Cheers, -- S

Re: [slurm-users] sshare RawUsage vs sreport usage

2020-10-21 Thread Stephan Schott
For the record, the issue seemed to be related to a low CPUs weight in TRESBillingWeights being applied to different partitions. Removing it or increasing the value made the accounting work again for all users. El mié., 26 ago. 2020 a las 17:54, Stephan Schott () escribió: > Still stuck with this