The intention there is to pack jobs on the smallest node that can handle the job.

This way jobs that only need 1 cpu don't take it from a 64-core node unless it has to, leaving that one available for that 64-core job.


It really boils down to what you want to happen, which will vary with each installation.


Brian Andrus


On 9/5/2019 8:48 AM, Douglas Duckworth wrote:
Hello

We added some newer Epyc nodes, with NVMe scratch, to our cluster and so want jobs to run on these over others.  So we added "Weight=100" /*to the older nodes*/ and left the new ones blank. So indeed, ceteris paribus, srun reveals that the faster nodes will accept jobs over older ones.

We have the desired outcome though I am a bit confused by two statements in the manpage <https://slurm.schedmd.com/slurm.conf.html> that seem to be contradictory:

"All things being equal, jobs will be allocated the nodes with the lowest weight which satisfies their requirements."

"...larger weights should be assigned to nodes with more processors, memory, disk space, higher processor speed, etc."

100 is larger than 1 and we do see jobs preferring the new nodes which have the default weight of 1.  Yet we're also told to assign larger weights to faster nodes?

Thanks!
Doug

--
Thanks,

Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit <https://scu.med.cornell.edu/>
Weill Cornell Medicine"
E: d...@med.cornell.edu
O: 212-746-6305
F: 212-746-8690

Reply via email to