Hello List, we're operating a large-ish cluster (about 900 nodes) with diverse hardware. It has been running with SGE for several years now, but the more we refine our configuration, the more we're feeling SGE's limitations. Therefore, we're considering switching to Slurm.
The latest challenge is this: a certain class of nodes has been optimized for small jobs -- we'd like to have two "half nodes", where jobs will be able to use one of the two GPUs, plus (at most) half of the CPUs. With SGE, we've put two queues on the nodes, but this effectively prevents certain maintenance jobs from running. How would I configure these nodes in Slurm? From the docs I gathered that MaxTRESPerJob would be a solution, but this is coupled to associations, which I do not fully understand. Is this the best/only way to achieve such a partioning? If so, do I need to define an association for every user, or can I define a default/skeleton association that new users automatically inherit? Are there other/better ways to go? Thanks a lot, A. -- Ansgar Esztermann Sysadmin http://www.mpibpc.mpg.de/grubmueller/esztermann
smime.p7s
Description: S/MIME cryptographic signature