Thanks Renfro. My scheduling policy is below. SchedulerType=sched/builtin SelectType=select/cons_res SelectTypeParameters=CR_Core AccountingStorageEnforce=associations AccountingStorageHost=192.168.150.223 AccountingStorageType=accounting_storage/slurmdbd ClusterName=hpc JobCompType=jobcomp/slurmdbd JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux SlurmctldDebug=5 SlurmdDebug=5 Waittime=0 Epilog=/etc/slurm/slurm.epilog.clean GresTypes=gpu MaxJobCount=5000000 SchedulerParameters=enable_user_top,default_queue_depth=1000000
# JOB PRIORITY PriorityType=priority/multifactor PriorityDecayHalfLife=2 PriorityUsageResetPeriod=DAILY PriorityWeightFairshare=500000 PriorityFlags=FAIR_TREE let me try changing it to the backfill and will see if it helps. Regards Navin. On Mon, Jul 13, 2020 at 5:16 PM Renfro, Michael <ren...@tntech.edu> wrote: > “The *SchedulerType* configuration parameter specifies the scheduler > plugin to use. Options are sched/backfill, which performs backfill > scheduling, and sched/builtin, which attempts to schedule jobs in a strict > priority order within each partition/queue.” > > https://slurm.schedmd.com/sched_config.html > > If you’re using the builtin scheduler, lower priority jobs have no way to > run ahead of higher priority jobs. If you’re using the backfill scheduler, > your jobs will need specific wall times specified, since the idea with > backfill is to run lower priority jobs ahead of time if and only if they > can complete without delaying the estimated start time of higher priority > jobs. > > On Jul 13, 2020, at 4:18 AM, navin srivastava <navin.alt...@gmail.com> > wrote: > > Hi Team, > > We have separate partitions for the GPU nodes and only CPU nodes . > > scenario: the jobs submitted in our environment is 4CPU+1GPU as well as > 4CPU only in nodeGPUsmall and nodeGPUbig. so when all the GPU exhausted > and rest other jobs are in queue waiting for the availability of GPU > resources.the job submitted with only CPU is not going through even > though plenty of CPU resources are available but the job which is only > looking CPU, also on pend because of these GPU based jobs( priority of GPU > jobs is higher than CPU one). > > Is there any option here we can do,so that when all GPU resources are > exhausted then it should allow the CPU jobs. Is there a way to deal with > it? or some custom solution which we can think of. There is no issue with > CPU only partitions. > > Below is the my slurm configuration file > > > NodeName=node[1-12] NodeAddr=node[1-12] Sockets=2 CoresPerSocket=10 > RealMemory=128833 State=UNKNOWN > NodeName=node[13-16] NodeAddr=node[13-16] Sockets=2 CoresPerSocket=10 > RealMemory=515954 Feature=HIGHMEM State=UNKNOWN > NodeName=node[28-32] NodeAddr=node[28-32] Sockets=2 CoresPerSocket=28 > RealMemory=257389 > NodeName=node[32-33] NodeAddr=node[32-33] Sockets=2 CoresPerSocket=24 > RealMemory=773418 > NodeName=node[17-27] NodeAddr=node[17-27] Sockets=2 CoresPerSocket=18 > RealMemory=257687 Feature=K2200 Gres=gpu:2 > NodeName=node[34] NodeAddr=node34 Sockets=2 CoresPerSocket=24 > RealMemory=773410 Feature=RTX Gres=gpu:8 > > > PartitionName=node Nodes=node[1-10,14-16,28-33,35] Default=YES > MaxTime=INFINITE State=UP Shared=YES > PartitionName=nodeGPUsmall Nodes=node[17-27] Default=NO MaxTime=INFINITE > State=UP Shared=YES > PartitionName=nodeGPUbig Nodes=node[34] Default=NO MaxTime=INFINITE > State=UP Shared=YES > > Regards > Navin. > > >