This is my solution to this problem so far:
1. Create a topology file so that Slurm will not place jobs across two
different message-passing networks.
2. Create partition called "general" for all the general-access nodes in
my environment.
3. Create a partition that is a duplicate of "gener
Prentice,
So, perhaps more like an a la carte menu? I could see having the
job_submit.lua plugin block submission unless specific constraint
classes are defined. Pair that with a QOS that a user needs to select
and you can (almost) do away partitions. You could have a chip class
(amd, intel),
Cyrus,
Thanks for the input. Yes, I have considered features/constraints as
part of this, and I'm already using them for users to request IB. They
are definitely a key part of my strategy. I will look into Spank and
PriorityTiers. One of my goals is to reduce the amount of
scripting/customiz
Hi Prentice,
Have you considered Slurm features and constraints at all? You provide
features (arbitrary strings in your slurm.conf) of what your hardware
can provide ("amd", "ib", "FAST", "whatever"). A user then will list
constraints using typical and/or/regex notation ( --constraint=amd&ib ).
I left out a a *very* critical detail: One of the reasons I'm looking at
revamping my Slurm configuration is that my users have requested the
capability to submit long-running, low-priority interruptible jobs that
can be killed and requeued when shorter-running, higher-priority jobs
need to use
Slurm Users,
I would like your input on the best way to configure Slurm for a
heterogeneous cluster I am responsible for. This e-mail will probably be
a bit long to include all the necessary details of my environment so
thanks in advance to those of you who read all of it!
The cluster I supp