Another way might be to implement slurm power off/on (if not already) and
induce it as required.
-
David Simpson - Senior Systems Engineer
ARCCA, Redwood Building,
King Edward VII Avenue,
Cardiff, CF10 3NB
a dummy job to bring powered down nodes up
then a clustershell slurmd stop is probably the answer
regards
David
-----
David Simpson - Senior Systems Engineer
ARCCA, Redwood Building,
King Edward VII Avenue,
Ca
nd any down nodes will automatically read the latest.
Yes, currently we use file based and config written to the compute node’s disks
themselves via ansible. Perhaps we will consider moving the file to a shared fs.
regards
David
-
David Simpson - Senior Systems Engineer
ARCCA, Redwood
anything else) to a node
which is down due to power saving (during a maintenance/reservation) what is
your approach? Do you end up with 2 slurm.confs (one for power saving and one
that keeps everything up, to work on during the maintenance)?
thanks
David
-
David Simpson - Senior
Out of interest (for those that do record and/or report on uptime) if you
aren't using the sreport cluster utilization report what alternative method are
you using instead?
If you are using sreport cluster utilization report have you encountered this?
thanks
David
-
David Si
ems with 3 nodes. So at
the moment off the top of the head we don't understand this reported Down time.
Is anyone else relying on sreport for this metric? If so have you encountered
this sort of situation?
regards
David
-----
David Simpson - Senior Systems Engineer
ARCCA, Redwood