On 2020/10/21 13:11, Christopher Samuel wrote:
I guess the question is (going back to your initial post):
> error: Failed build dependencies:
>munge-libs is needed by slurm-20.02.5-1.x86_64
Had you installed libmunge2 before trying this build?
rpmbuild can't install it for you if
Good day everyone!
We have a GPU-only cluster with Slurm 19.05.5 installed. My expectation with
respect to the current configuration is that users submitting jobs must include
at least one of the following header options:
--gres
--gpus
--gpus-per-node
--gpus-per-socket
--gpus-per-task
The
Just a quick addendum - rsmi_dev_drm_render_minor_get
used in the plugin references the ROCM-SMI lib from https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/2e8dc4f2a91bfa7661f4ea289736b12153ce23c2/src/rocm_smi.cc#L1689
So the library (as an .so file) should be installe
Take a look at https://github.com/SchedMD/slurm/search?q=dri%2F
If the ROCM-SMI API is present, using AutoDetect=rsmi in
gres.conf might be enough, if I'm reading this right.
Of course, this assumes the cards in question are AMD and not
NVIDIA.
Hello,
We recently created a new partition with the following slurm.conf and QOS
settings:
*cat /etc/slurm/slurm.conf | grep part-long*
*PartitionName=part-long Nodes=node-1,node-2,node-3 Default=YES,
AllowAccounts=group1,group2 TRESBillingWeights="gres/gpu=22" MaxNodes=1
MaxTime=10-0 QOS=long-10
Thanks for the additional information, Stephan!
At this point, I’ll have to ask for anyone with more job array experience than
I have (because I have none!) to speak up.
Remember that we’re all in this together(*), so any help that anyone can offer
will be good!
Andy
(*) Well, actually, I’m r
>Hi everyone,
>I am having doubts regarding array jobs. To me it seems that the
>JobArrayTaskLimit has precedence over the Fairshare, as users with a
>way lower priority seem to get constant allocations for their array
>jobs, compared to users with "normal" jobs. Can someone con
And I forgot to mention, things are running in a Qlustar cluster based on
Ubuntu 18.04.4 LTS Bionic. 😬
El mié., 21 oct. 2020 a las 15:38, Stephan Schott ()
escribió:
> Oh, sure, sorry.
> We are using slurm 18.08.8, with a backfill scheduler. The jobs are being
> assigned to the same partition, wh
Oh, sure, sorry.
We are using slurm 18.08.8, with a backfill scheduler. The jobs are being
assigned to the same partition, which limits gpus and cpus to 1 via QOS.
Here some of the main flags:
SallocDefaultCommand="srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 --pty
--preserve-env --mpi=none $SHELL"
T
Also, of course, any of the information that you can provide about how the
system is configured: scheduler choices, QOS options, and the like, would also
help in answering your question.
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Riebs, Andy
Sent: Wednesday, O
Stephan (et al.),
There are probably 6 versions of Slurm in common use today, across multiple
versions each of Debian/Ubuntu, SuSE/SLES, and RedHat/CentOS/Fedora. You are
more likely to get a good answer if you offer some hints about what you are
running!
Regards,
Andy
From: slurm-users [mail
This is related to this other thread:
https://groups.google.com/g/slurm-users/c/88pZ400whu0/m/9FYFqKh6AQAJ
AFAIK, the only rudimentary solution is the MaxCPUsPerNode partition flag,
and setting independent gpu and cpu partitions, but having something like
"CpusReservedPerGpu" would be nice.
@Aaron
Hi everyone,
I am having doubts regarding array jobs. To me it seems that the
JobArrayTaskLimit has precedence over the Fairshare, as users with a way
lower priority seem to get constant allocations for their array jobs,
compared to users with "normal" jobs. Can someone confirm this?
Cheers,
--
S
For the record, the issue seemed to be related to a low CPUs weight in
TRESBillingWeights being applied to different partitions. Removing it or
increasing the value made the accounting work again for all users.
El mié., 26 ago. 2020 a las 17:54, Stephan Schott ()
escribió:
> Still stuck with this
14 matches
Mail list logo