Re: [slurm-users] scheduling issue

Erik Eisold Thu, 20 Aug 2020 00:38:20 -0700

Thank you for your reply and apologies for not reacting sooner I havekept busy until now. I have attached our partition definitions to thismail.

As for your second question MPI jobs aren't really a issue in ourcluster there are a few in between but not nearly enough to explain upto 20 nodes at a time remaining idle for up to a day when we have thebackfill scheduler configured and jobs in the short partition that wouldfit on the nodes that remain idle.


Kind regards,
Erik Eisold

On 14/08/2020 14:18, Renfro, Michael wrote:

We’ve run a similar setup since I moved to Slurm 3 years ago, with noissues. Could you share partition definitions from your slurm.conf?
When you see a bunch of jobs pending, which ones have a reason of“Resources”? Those should be the next ones to run, and ones with areason of “Priority” are waiting for higher priority jobs to start(including the ones marked “Resources”). The only time I’ve seen nodessit idle is when there’s an MPI job pending with “Resources”, and ifany smaller jobs started, it would delay that job’s start.
--
Mike Renfro, PhD / HPC Systems Administrator, Information TechnologyServices
931 372-3601 <tel:931%20372-3601>      / Tennessee Tech University
On Aug 14, 2020, at 4:20 AM, Erik Eisold <eis...@pks.mpg.de> wrote:

Our node topology is a bit special where almost all our nodes are in one
common partition a subset of all those nodes are then in another
partition and this repeats once more the only difference between the
partitions except the nodes in it are the maximum run time. The reason I
originally set it up this way was to ensure that users with shorter jobs
had a quicker response time and the whole cluster wouldn't be clogged up
with long running jobs for days on end this and I was new to the whole
cluster setup and Slurm itself. I have attached a rough visualization of
this setup to this mail. There are 2 more totally separate partitions
that are not in this image.

My idea for a solution would be to move all nodes to one common
partition and using partition QOS to implement time and resource
restrictions because I think the scheduler is not really meant to handle
the type of setup we choose in the beginning.

PartitionName=debug Nodes=eos[01-02],kalyke[01-72],oberon[01-16],triton[01-16] 
MaxTime=10:00 DefaultTime=10:00 State=UP PriorityJobFactor=1 QOS=default

PartitionName=extra_long Nodes=iris[01-12],osiris[01-05] MaxTime=28-00:00:00 
DefaultTime=02:00:00 State=UP PriorityJobFactor=100 QOS=default
PartitionName=graphic Nodes=dione[01-06],icarus[01-05] MaxTime=2-00:00:00 
DefaultTime=01:00:00 State=UP QOS=default PriorityJobFactor=100
PartitionName=long 
Nodes=amun[01-10],anubis[01-16],apollo[01-04],elara[01-60],flora[01-10],gaspra[01-52],hathor[01-08],hermes[01-08],horus[01-10],ida[01-08],io[01-32],iris[01-12],isis[01-12],kalyke[01-72],kepler[01-32],merkur[01-32],metis[01-72],mimas[01-08],oberon[01-16],pallas[01-08],rhea[01-32],seth[01-02],sinope[01-72],titan[01-08],thalia[01-60],triton[01-16],tycho[01-32]
 MaxTime=14-00:00:00 DefaultTime=2-00:00:00 State=UP QOS=default 
PriorityJobFactor=100
PartitionName=medium 
Nodes=amun[01-10],ananke[01-04],anubis[01-16],apollo[01-04],ceres[01-04],elara[01-60],flora[01-10],gaspra[29-52],hekat[01-05],hermes[01-08],horus[01-10],ida[01-08],io[01-32],iris[01-12],isis[01-12],juno[01-40],kepler[01-32],merkur[01-32],metis[01-72],oberon[01-16],osiris[01-05],rhea[01-32],seth[01-02],sinope[01-72],thalia[01-60],titan[01-08],triton[01-16],tycho[01-32],hathor[01-08],mimas[01-08],pallas[01-08]
 MaxTime=2-00:00:00 DefaultTime=2:00:00 State=UP QOS=default 
PriorityJobFactor=100
PartitionName=short 
Nodes=amun[01-10],ananke[01-04],anubis[01-16],apollo[01-04],ceres[01-04],elara[01-60],flora[01-10],gaspra[01-52],hermes[01-08],hekat[01-05],horus[01-10],ida[01-08],io[01-32],iris[01-12],juno[01-40],kepler[01-32],leda[01-72],merkur[01-32],metis[01-72],oberon[01-16],osiris[01-05],rhea[01-32],sinope[01-72],snowy[01-20],titan[01-08],triton[01-16],tycho[01-32],hathor[01-08],mimas[01-08],pallas[01-08],thalia[01-60]
 MaxTime=2:00:00 DefaultTime=1:00:00 State=UP Default=YES QOS=default 
PriorityJobFactor=100
PartitionName=testing Nodes=icarus[01-05],snowy[01-20],titan[01-08],openpower 
MaxTime=2-00:00:00 DefaultTime=01:00:00 State=UP QOS=default AllowGroups=edv

Re: [slurm-users] scheduling issue

Reply via email to