I recommend the LLN option for partitions:
*LLN*
Schedule resources to jobs on the least loaded nodes (based upon the
number of idle CPUs). This is generally only recommended for an
environment with serial jobs as idle resources will tend to be
highly fragmented, resulting in parallel jobs being distributed
across many nodes. Note that node *Weight* takes precedence over how
many idle resources are on each node. Also see the
*SelectParameters* configuration parameter *CR_LLN* to use the least
loaded nodes in every partition.
-Paul Edmon-
On 11/15/2018 4:25 AM, Aravindh Sampathkumar wrote:
Hi All.
I'm having some trouble finding appropriate section of the
documentation to change slurm resource allocation policy.
We have configured CPU and memory as consumable resources, and our
nodes can run multiple jobs as long as there are CPU memory available.
What I want is for Slurm to spread jobs across all available servers
in a partition instead of loading up few servers while others are idling.
For example, I have a partition nav which has 5 compute
nodes(node[1-5]) dedicated to it.
when users submit 3 jobs to nav partition, each requesting 1 CPU core
and 1 GB of memory, SLURM schedules all the jobs in node1 because it
has enough CPU cores and memory to satisfy job requirements. nodes -
2,3,4,5 are idle.
What I want instead is for slurm to schedule job1 to node1, job2 to
node2, job3 to node3.. and then in the future if there are more jobs
than there are nodes, slurm must utilise the rest of resources
available in node1.
Why?
A small group that is using this partition is concerned that all their
jobs get scheduled on the same node, and they need to share network
bandwidth, and bandwidth to local disk. If they were spread out
instead, they could use better bandwidth.
Appreciate any advice how I can make this happen.
Thanks,
Aravindh Sampathkumar
aravi...@fastmail.com