While I am not sure of your specifics, you could easily add lines to
your suspend/resume scripts to check/wait/etc if there are tasks waiting.
Brian Andrus
On 1/15/2024 12:22 AM, 김종록 wrote:
I'm going to use Slurm's cloud feature in private cloud.
The problem is that the scale out/in of the instance is not
simultaneous in my cloud.
This means that if there is a scale out/in trigger, no other work is
done until the trigger is completed.
If so, the Suspend/Resume generated later must be started only when
the previous work is completed, but the timeout is not known accurately.
Is there any way to limit Suspend/Resume request in Slurm?
As far as I know, there is a Suspend/ResumeRate, but this only limits
the number of nodes per minute and does not limit concurrency.