Re: [slurm-users] Slurm configuration, Weight Parameter

Sarlo, Jeffrey S Thu, 05 Dec 2019 10:33:28 -0800

We have weights and priority/multifactor.

Jeff


From: Sistemas NLHPC [mailto:[email protected]]
Sent: Thursday, December 05, 2019 12:01 PM
To: Sarlo, Jeffrey S; Slurm User Community List
Subject: Re: [slurm-users] Slurm configuration, Weight Parameter

Thanks Jeff !

We upgrade slurm to 18.08.4 and now work with Weight !  but the parameter its 
possible running with plugin priority/multifactor ?

Thanks in advance

Regards

El mar., 3 dic. 2019 a las 17:37, Sarlo, Jeffrey S 
(<[email protected]<mailto:[email protected]>>) escribió:
Which version of slurm are you using?  I know in the early versions of 18.08 
prior to 18.08.04 there was a bug with weights not working.  Once we got past 
18.08.04,  then weights worked for us.

Jeff
University of Houston - HPC

From: slurm-users 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Sistemas NLHPC
Sent: Tuesday, December 03, 2019 12:33 PM
To: Slurm User Community List
Subject: Re: [slurm-users] Slurm configuration, Weight Parameter

Hi Renfro

I am testing this configuration, test configuration and as clean as possible:

====

NodeName=devcn050 RealMemory=3007 Features=3007MB Weight=200 State=idle 
Sockets=2 CoresPerSocket=1
NodeName=devcn002 RealMemory=3007 Features=3007MB Weight=1 State=idle Sockets=2 
CoresPerSocket=1
NodeName=devcn001 RealMemory=2000 Features=2000MB Weight=500 State=idle 
Sockets=2 CoresPerSocket=1

PartitionName=slims Nodes=devcn001,devcn002,devcn050 Default=yes Shared=yes 
State=up

===

In your config is necessary one plugin extra or parameter for option Weight?

The configuration does not work as expected.

Regards,

El sáb., 30 nov. 2019 a las 10:30, Renfro, Michael 
(<[email protected]<mailto:[email protected]>>) escribió:
We’ve been using that weighting scheme for a year or so, and it works as 
expected. Not sure how Slurm would react to multiple NodeName=DEFAULT lines 
like you have, but here’s our node settings and a subset of our partition 
settings.

In our environment, we’d often have lots of idle cores on GPU nodes, since 
those jobs tend to be GPU-bound rather than CPU-bound. So in one of our 
interactive partitions, we let non-GPU jobs take up to 12 cores of a GPU node. 
Additionally, we have three memory configurations in our main batch partition. 
We want to bias jobs to running on the smaller-memory nodes by default. And the 
same principle applies to our GPU partition, where the smaller-memory GPU nodes 
get jobs before the larger-memory GPU node.

=====

NodeName=gpunode[001-003]  CoresPerSocket=14 RealMemory=382000 Sockets=2 
ThreadsPerCore=1 Weight=10011 Gres=gpu:2
NodeName=gpunode004  CoresPerSocket=14 RealMemory=894000 Sockets=2 
ThreadsPerCore=1 Weight=10021 Gres=gpu:2
NodeName=node[001-022]  CoresPerSocket=14 RealMemory=62000 Sockets=2 
ThreadsPerCore=1 Weight=10201
NodeName=node[023-034]  CoresPerSocket=14 RealMemory=126000 Sockets=2 
ThreadsPerCore=1 Weight=10211
NodeName=node[035-040]  CoresPerSocket=14 RealMemory=254000 Sockets=2 
ThreadsPerCore=1 Weight=10221

PartitionName=any-interactive Default=NO MinNodes=1 MaxNodes=4 MaxTime=02:00:00 
AllowGroups=ALL PriorityJobFactor=3 PriorityTier=1 DisableRootJobs=NO 
RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO 
DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO MaxCPUsPerNode=12 
ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP 
Nodes=node[001-040],gpunode[001-004]

PartitionName=batch Default=YES MinNodes=1 MaxNodes=40 DefaultTime=1-00:00:00 
MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF 
ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO 
ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-040]

PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 
MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF 
ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO 
MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 
State=UP Nodes=gpunode[001-004]

=====

> On Nov 29, 2019, at 8:09 AM, Sistemas NLHPC 
> <[email protected]<mailto:[email protected]>> wrote:
>
> External Email Warning
> This email originated from outside the university. Please use caution when 
> opening attachments, clicking links, or responding to requests.
> Hi All,
>
> Thanks all for your posts
>
> Reading the documentation of Slurm and other sites like Niflheim 
> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight (Ole Holm 
> Nielsen) the parameter "Weight" is to assign a value to the nodes, with this 
> you can have priority in the nodes. But I have not obtained positive results.
>
> Thanks in advance
>
> Regards
>
> El sáb., 23 nov. 2019 a las 14:18, Chris Samuel 
> (<[email protected]<mailto:[email protected]>>) escribió:
> On 23/11/19 9:14 am, Chris Samuel wrote:
>
> > My gut instinct (and I've never tried this) is to make the 3GB nodes be
> > in a separate partition that is guarded by AllowQos=3GB and have a QOS
> > called "3GB" that uses MinTRESPerJob to require jobs to ask for more
> > than 2GB of RAM to be allowed into the QOS.
>
> Of course there's nothing to stop a user requesting more memory than
> they need to get access to these nodes, but that's a social issue not a
> technical one. :-)
>
> --
>   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>

Re: [slurm-users] Slurm configuration, Weight Parameter

Reply via email to