Re: [slurm-users] Rolling reboot with at most N machines down simultaneously?

2022-08-04 Thread David Simpson
Another way might be to implement slurm power off/on (if not already) and induce it as required. - David Simpson - Senior Systems Engineer ARCCA, Redwood Building, King Edward VII Avenue, Cardiff, CF10 3NB

Re: [slurm-users] monitoring and update regime for Power Saving nodes

2022-02-24 Thread David Simpson
a dummy job to bring powered down nodes up then a clustershell slurmd stop is probably the answer regards David ----- David Simpson - Senior Systems Engineer ARCCA, Redwood Building, King Edward VII Avenue, Ca

Re: [slurm-users] monitoring and update regime for Power Saving nodes

2022-02-24 Thread David Simpson
nd any down nodes will automatically read the latest. Yes, currently we use file based and config written to the compute node’s disks themselves via ansible. Perhaps we will consider moving the file to a shared fs. regards David - David Simpson - Senior Systems Engineer ARCCA, Redwood

[slurm-users] monitoring and update regime for Power Saving nodes

2022-02-23 Thread David Simpson
anything else) to a node which is down due to power saving (during a maintenance/reservation) what is your approach? Do you end up with 2 slurm.confs (one for power saving and one that keeps everything up, to work on during the maintenance)? thanks David - David Simpson - Senior

Re: [slurm-users] Validating SLURM sreport cluster utilization report

2021-01-29 Thread David Simpson
Out of interest (for those that do record and/or report on uptime) if you aren't using the sreport cluster utilization report what alternative method are you using instead? If you are using sreport cluster utilization report have you encountered this? thanks David - David Si

[slurm-users] Validating SLURM sreport cluster utilization report

2021-01-22 Thread David Simpson
ems with 3 nodes. So at the moment off the top of the head we don't understand this reported Down time. Is anyone else relying on sreport for this metric? If so have you encountered this sort of situation? regards David ----- David Simpson - Senior Systems Engineer ARCCA, Redwood