ason
> ---
>
> The information in this email, including attachments, may be confidential
> and is intended solely for the addressee(s). If you believe you received
> this email by mistake, please notify the sender by return email as soon as
> possible.
>
--
Stephan Schott Ver
t the end of the week, so I’m not sure
> that I’ll have a lot of Slurm in my life, going forward J
>
>
>
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On
> Behalf Of *Stephan Schott
> *Sent:* Wednesday, October 21, 2020 9:40 AM
> *To:* Slurm User Com
And I forgot to mention, things are running in a Qlustar cluster based on
Ubuntu 18.04.4 LTS Bionic. 😬
El mié., 21 oct. 2020 a las 15:38, Stephan Schott ()
escribió:
> Oh, sure, sorry.
> We are using slurm 18.08.8, with a backfill scheduler. The jobs are being
> assigned to the same
dHat/CentOS/Fedora. You are more likely to get a good answer if you offer
> some hints about what you are running!
>
>
>
> Regards,
>
> Andy
>
>
>
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com
> ] *On Behalf Of *Stephan Schott
> *Sent:*
t;> Take for example a node that has:
> >>
> >> * four GPUs
> >>
> >> * 16 CPUs
> >>
> >>
> >> Let's assume that most jobs would work just fine with a minimum number
> >> of 2 CPUs per GPU. Then we could set in the node de
is?
Cheers,
--
Stephan Schott Verdugo
Biochemist
Heinrich-Heine-Universitaet Duesseldorf
Institut fuer Pharm. und Med. Chemie
Universitaetsstr. 1
40225 Duesseldorf
Germany
For the record, the issue seemed to be related to a low CPUs weight in
TRESBillingWeights being applied to different partitions. Removing it or
increasing the value made the accounting work again for all users.
El mié., 26 ago. 2020 a las 17:54, Stephan Schott ()
escribió:
> Still stuck w
= 0 sec
> X11Parameters = (null)
>
> Cgroup Support Configuration:
> AllowedDevicesFile = /etc/slurm/cgroup_allowed_devices_file.conf
> AllowedKmemSpace= (null)
> AllowedRAMSpace = 100.0%
> AllowedSwapSpace= 0.0%
> CgroupAutomount = yes
> CgroupMountpoint= /sys/fs/cgroup
> ConstrainCores = yes
> ConstrainDevices= yes
> ConstrainKmemSpace = no
> ConstrainRAMSpace = yes
> ConstrainSwapSpace = yes
> MaxKmemPercent = 100.0%
> MaxRAMPercent = 100.0%
> MaxSwapPercent = 100.0%
> MemorySwappiness= (null)
> MinKmemSpace= 30 MB
> MinRAMSpace = 30 MB
> TaskAffinity= no
>
> Slurmctld(primary) at sms.mycluster is UP
>
>
>
--
Stephan Schott Verdugo
Biochemist
Heinrich-Heine-Universitaet Duesseldorf
Institut fuer Pharm. und Med. Chemie
Universitaetsstr. 1
40225 Duesseldorf
Germany
ck if that is actually the case.
> Any ideas are welcome,
>
> --
> Stephan Schott Verdugo
> Biochemist
>
> Heinrich-Heine-Universitaet Duesseldorf
> Institut fuer Pharm. und Med. Chemie
> Universitaetsstr. 1
> 40225 Duesseldorf
> Germany
>
--
Stephan Schott Verd
d are using more or less the same partitions. The only
difference I saw was the usage of array jobs instead of normal batch jobs,
but I have no idea why that would cause differences; we are now running
some tests to check if that is actually the case.
Any ideas are welcome,
--
Stephan Schott Ve
10 matches
Mail list logo