Am Wed, 29 Mar 2023 14:42:33 +0200 schrieb Ben Polman <ben.pol...@science.ru.nl>:
> I'd be interested in your kludge, we face a similar situation where the > slurmctld node > does not have access to the ipmi network and can not ssh to machines > that have access. > We are thinking on creating a rest interface to a control server which > would be running the ipmi commands We settled on transient files in /dev/shm on the slurmctld side as "API". You could call it in-memory transactional database;-) #!/bin/sh # node-suspend and node-resume (symlinked) script powerdir=/dev/shm/powersave scontrol=$(cd "$(dirname "$0")" && pwd)/scontrol hostlist=$1 case $0 in *-suspend) subdir=suspend ;; *-resume) subdir=resume ;; esac mkdir -p "$powerdir/$subdir" && cd "$powerdir/$subdir" && tmp=$(mktemp XXXXXXX.tmp) && $scontrol show hostnames "$hostlist" > "$tmp" && echo "$(date +%Y%m%d-%H%M%S) $(basename $0) $(cat "$tmp"|tr '\n' ' ')" >> $powerdir/log mv "$tmp" "${tmp%.tmp}.list" # end This atomically creates powersave/suspend/*.list and powersave/resume/*.list files with node names in them. On the priviledged server, a script periodically looked at the directories (via ssh) and triggered the appropriate actions, including some heuristics about unlcean shutdowns or spontaneous re-availability (with a thousand runs, there's a good chance for something getting stuck, in some driver code, even). #!/bin/sh powerdir=/dev/shm/powersave batch() { ssh-wrapper-that-correctly-quotes-argument-list --host=batchhost "$@" } while sleep 5 do suspendlists=$(batch ls "$powerdir/suspend/" 2>/dev/null | grep '.list$') for f in $suspendlists do hosts=$(batch cat "$powerdir/suspend/$f" 2>/dev/null) for h in $hosts do case "$h" in node*|data*) echo "suspending $h" node-shutdown-wrapper "$h" ;; *) echo "malformed node name" ;; esac done batch rm -f "$powerdir/suspend/$f" done resumelists=$(batch ls $powerdir/resume/ 2>/dev/null | grep '.list$') for f in $resumelists do hosts=$(batch cat "$powerdir/resume/$f" 2>/dev/null) for h in $hosts do - case "$h" in node*) echo "resuming $h" # Assume the node _should_ be switched off. Ensure that now (in # case it hung during shutdown). if ipmi-wrapper "$h" chassis power status|grep -q on$; then if ssh -o ConnectTimeout=2 "$h" pgrep slurmd >/dev/null 2>&1 </dev/null; then echo "skipping apparently active node $h" else echo "forcing power reset on $h" ipmi-wrapper "$h" chassis power reset fi else ipmi-wrapper "$h" chassis power on fi # Wait to make sure? ;; *) echo "malformed node name" ;; esac done batch rm -f "$powerdir/resume/$f" done done # end The current approach handles resume better, waiting for a number of hosts at he same time and only un-draining those that reappeared. Back then, we relied on the nodes being automatically incorporated by slurmctld. This worked mostly, but not always, resulting in spurious NODE_FAILs which started to annoy users. Alrighty then, Thomas -- Dr. Thomas Orgis HPC @ Universität Hamburg