[slurm-users] Re: Run a command in Slurm with all streams and signals connected to the submitting command

Michael Milton via slurm-users Fri, 04 Apr 2025 20:55:51 -0700

Thanks Davide,

It's true that srun will create an allocation if you aren't inside a job,
but if you are inside a job and you request more resources than it has,
then srun will just fail. This is the key issue that I want to avoid.


On Sat, Apr 5, 2025 at 11:48 AM Davide DelVento <davide.quan...@gmail.com>
wrote:

> The plain srun is probably the best bet, and if you really need the thing
> to be started from another slurm job (rather than the login node) you will
> need to exploit the fact that
>
> > If necessary, srun will first create a resource allocation in which to
> run the parallel job.
>
> AFAIK, there is no option to for the "create a resource allocation" even
> if it's not necessary. But you may try to request something that is "above
> and beyond" what the current allocation provides, and that might solve your
> problem.
> Looking at the srun man page, I could speculate that --clusters
> or --cluster-constraint might help in that regard (but I am not sure).
>
> Have a nice weekend
>
>
> On Fri, Apr 4, 2025 at 6:27 AM Michael Milton via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> I'm helping with a workflow manager that needs to submit Slurm jobs. For
>> logging and management reasons, the job (e.g. srun python) needs to be run
>> as though it were a regular subprocess (python):
>>
>>    - stdin, stdout and stderr for the command should be connected to
>>    process inside the job
>>    - signals sent to the command should be sent to the job process
>>    - We don't want to use the existing job allocation, if this is run
>>    from a Slurm job
>>    - The command should only terminate when the job is finished, to
>>    avoid us needing to poll Slurm
>>
>> We've tried:
>>
>>    - sbatch --wait, but then SIGTERM'ing the process doesn't kill the job
>>    - salloc, but that requires a TTY process to control it (?)
>>    - salloc srun seems to mess with the terminal when it's killed,
>>    likely because of being "designed to be executed in the foreground"
>>    - Plain srun re-uses the existing Slurm allocation, and specifying
>>    resources like --mem will just request then from the current job rather
>>    than submitting a new one
>>
>> What is the best solution here?
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Run a command in Slurm with all streams and signals connected to the submitting command

Reply via email to