Jens, you are most welcome. I'll be curious to hear a followup after you
have evaluated it. I myself have not used it in production but I thought
it looked really cool.
It is interesting to me for increasing overall resource utilization for
partitions with transient "real time" demands. I should be able to take,
say, 25% of this real time partition, overlap the nodes with another
general production partition, set the real time partition priority
slightly higher, and set a floating reservation to keep job turnaround
high on this set of overlapping nodes in case the real time partition
becomes busy. I think this would be a nice solution that does not
involve job preemption.
Cheers,
Cyrus
On 08/13/2018 11:20 AM, Jens Dreger wrote:
Hi Cyrus!
On Mon, Aug 13, 2018 at 08:44:15AM -0500, Cyrus Proctor wrote:
Hi Jens,
Check out https://slurm.schedmd.com/reservations.html specifically the "
Reservations Floating Through Time" section. In your case, set a walltime of 14
days for your partition that contains n[01-10]. Then, create a floating
reservation on node n[06-10] for n + 1 day where "n" is always evaluated as
now.
This is just perfect! Thank you!
Jens.
If you wish to allow the user more control, then specify a "Feature" in
slurm.conf for you nodes. Something like:
NodeName=n[01-05] Sockets=1 CoresPerSocket=48 ThreadsPerCore=2 State=UNKNOWN
Feature=long
NodeName=n[06-10] Sockets=1 CoresPerSocket=48 ThreadsPerCore=2 State=UNKNOWN
Feature=short
The feature is an arbitrary string that the admin sets. Then a user could
specify in their submission as something like:
sbatch --constraint="long|short" batch.slurm
Best,
Cyrus
On 08/13/2018 08:28 AM, Loris Bennett wrote:
Hi Jens,
Jens Dreger <jens.dre...@physik.fu-berlin.de> writes:
Hi everyone!
Is it possible to transparently assign different walltime limits
to nodes without forcing users to specify partitions when submitting
jobs?
Example: let's say I have 10 nodes. Nodes n01-n05 should be available
for jobs with a walltime up to 14 days, while n06-n10 should only
be used for jobs with a walltime limit less then 1 day. Then as long
as nodes n06-n10 have free resources, jobs with walltime <1day should
be scheduled to these nodes. If n06-n10 are full, jobs with walltime
<1day should start on n01-n05. Users should not have to specify
partitions.
Would this even be possible to do with just one partition much
like nodes with different memory size using weights to fill nodes
with less memoery first?
Background of this question is that it would be helpfull to be able
to lower the walltime for a rack of nodes, e.g. when adding this rack
to an existing cluster in order to be able to easily shut down just
this rack after one day in case of instabilities. Much like adding
N nodes to a cluster without changing anything else and have only
jobs with walltime <1day on thiese nodes in the beginning.
If you just want to reduce the allowed wall-time for a given rack, can't
you just use a maintenance reservation for the appropriate set of nodes?
Loris