[slurm-users] How to partition nodes into smaller units

Ansgar Esztermann-Kirchner Tue, 05 Feb 2019 07:49:05 -0800

Hello List,

we're operating a large-ish cluster (about 900 nodes) with diverse
hardware. It has been running with SGE for several years now, but the
more we refine our configuration, the more we're feeling SGE's
limitations.
Therefore, we're considering switching to Slurm.


The latest challenge is this: a certain class of nodes has been
optimized for small jobs -- we'd like to have two "half nodes", where
jobs will be able to use one of the two GPUs, plus (at most) half of
the CPUs. With SGE, we've put two queues on the nodes, but this
effectively prevents certain maintenance jobs from running.

How would I configure these nodes in Slurm? From the docs I gathered
that MaxTRESPerJob would be a solution, but this is coupled to
associations, which I do not fully understand. 
Is this the best/only way to achieve such a partioning? 
If so, do I need to define an association for every user, or can I
define a default/skeleton association that new users automatically
inherit?
Are there other/better ways to go?

Thanks a lot,

A.
-- 
Ansgar Esztermann
Sysadmin
http://www.mpibpc.mpg.de/grubmueller/esztermann

smime.p7s
Description: S/MIME cryptographic signature

[slurm-users] How to partition nodes into smaller units

Reply via email to