The intention there is to pack jobs on the smallest node that can handle
the job.
This way jobs that only need 1 cpu don't take it from a 64-core node
unless it has to, leaving that one available for that 64-core job.
It really boils down to what you want to happen, which will vary with
each installation.
Brian Andrus
On 9/5/2019 8:48 AM, Douglas Duckworth wrote:
Hello
We added some newer Epyc nodes, with NVMe scratch, to our cluster and
so want jobs to run on these over others. So we added "Weight=100"
/*to the older nodes*/ and left the new ones blank. So indeed, ceteris
paribus, srun reveals that the faster nodes will accept jobs over
older ones.
We have the desired outcome though I am a bit confused by two
statements in the manpage <https://slurm.schedmd.com/slurm.conf.html>
that seem to be contradictory:
"All things being equal, jobs will be allocated the nodes with the
lowest weight which satisfies their requirements."
"...larger weights should be assigned to nodes with more processors,
memory, disk space, higher processor speed, etc."
100 is larger than 1 and we do see jobs preferring the new nodes which
have the default weight of 1. Yet we're also told to assign larger
weights to faster nodes?
Thanks!
Doug
--
Thanks,
Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit <https://scu.med.cornell.edu/>
Weill Cornell Medicine"
E: d...@med.cornell.edu
O: 212-746-6305
F: 212-746-8690