Hi all, I found out a way to avoid oversubscribing. I had to comment this configuration:
PreemptMode=Suspend,Gang PreemptType=preempt/partition_prio In my actual configuration, all the partitions are at the same priority. At times, I increase the priority of a partition and jobs in other partitions are suspended. That works fine. But I still do not understand why oversubscribing occurs when preemption is activated. I would like to keep preemption by suspending and not get oversubscription. If anyone have an idea of how to do this. Thank you! Stéphane -----Message d'origine----- De : Stéphane Larose Envoyé : 17 avril 2018 10:02 À : 'Slurm User Community List' <slurm-users@lists.schedmd.com> Objet : RE: [slurm-users] Node OverSubscribe even if set to no Hi Chris, > You might want to double check the config is acting as expected with: > > scontrol show part | fgrep OverSubscribe PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO > Also what does this say? > > scontrol show config | fgrep SelectTypeParameters SelectTypeParameters = CR_CPU_MEMORY From the doc, it seems that only CR_Memory implies OverSubscribe=YES : All CR_s assume OverSubscribe=No or OverSubscribe=Force EXCEPT for CR_MEMORY which assumes OverSubscribe=Yes When I do "scontrol list jobs", all jobs have OverSubscribe=OK (which is not Yes). Again from the docs it seems fine: "OK" otherwise (typically allocated dedicated CPUs) Thanks again, Stéphane -----Message d'origine----- De : slurm-users <slurm-users-boun...@lists.schedmd.com> De la part de Chris Samuel Envoyé : 17 avril 2018 04:29 À : slurm-users@lists.schedmd.com Objet : Re: [slurm-users] Node OverSubscribe even if set to no On Tuesday, 17 April 2018 5:26:26 AM AEST Stéphane Larose wrote: > So some jobs are now sharing the same cores but I don’t understand why > since OverSubscribe is set to no. You might want to double check the config is acting as expected with: scontrol show part | fgrep OverSubscribe Also what does this say? scontrol show config | fgrep SelectTypeParameters I note that if you've got CR_Memory then: CR_Memory Memory is a consumable resource. NOTE: This implies OverSubscribe=YES or OverSubscribe=FORCE for all partitions. Setting a value for DefMem‐ PerCPU is strongly recommended. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC