[slurm-users] Re: Elastic Computing: Is it possible to incentivize grouping power_up calls?

Brian Andrus via slurm-users Mon, 08 Apr 2024 09:18:34 -0700

Xaver,

You may want to look at the ResumeRate option in slurm.conf:


   ResumeRate
   The rate at which nodes in power save mode are returned to normal
   operation by ResumeProgram. The value is a number of nodes per
   minute and it can be used to prevent power surges if a large number
   of nodes in power save mode are assigned work at the same time (e.g.
   a large job starts). A value of zero results in no limits being
   imposed. The default value is 300 nodes per minute.

I have all our nodes in the cloud and they power down/deallocate whenidle for a bit. I do not use ansible to start them and use the cliinterface directly, so the only cpu usage is by that command. I do planon having ansible run from the node to do any hot-fix/updates from thebase image or changes. By running it from the node, it would alleviateany cpu spikes on the slurm head node.


Just a possible path to look at.

Brian Andrus

On 4/8/2024 6:10 AM, Xaver Stiensmeier via slurm-users wrote:

Dear slurm user list,

we make use of elastic cloud computing i.e. node instances are created
on demand and are destroyed when they are not used for a certain amount
of time. Created instances are set up via Ansible. If more than one
instance is requested at the exact same time, Slurm will pass those into
the resume script together and one Ansible call will handle all those
instances.

However, more often than not workflows will request multiple instances
within the same second, but not at the exact same time. This leads to
multiple resume script calls and therefore to multiple Ansible calls.
This will lead to less clear log files, greater CPU consumption by the
multiple running Ansible calls and so on.

What I am looking for is an option to force Slurm to wait a certain
amount and then perform a single resume call for all instances within
that time frame (let's say 1 second).

Is this somehow possible?

Best regards,
Xaver

-- 
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[slurm-users] Re: Elastic Computing: Is it possible to incentivize grouping power_up calls?

Reply via email to