Re: [slurm-users] salloc with bash scripts problem

Ryan Novosielski Wed, 02 Jan 2019 09:39:56 -0800

I don’t think that’s true (and others have shared documentation regarding 
interactive jobs and the S commands). There was documentation shared for how 
this works, and it seems as if it has been ignored.


[novosirj@amarel2 ~]$ salloc -n1
salloc: Pending job allocation 83053985
salloc: job 83053985 queued and waiting for resources
salloc: job 83053985 has been allocated resources
salloc: Granted job allocation 83053985
salloc: Waiting for resource configuration
salloc: Nodes slepner012 are ready for job

This is the behavior I’ve always seen. If I include a command at the end of the 
line, it appears to simply run it in the “new” shell that is created by salloc 
(which you’ll notice you can exit via CTRL-D or exit).

[novosirj@amarel2 ~]$ salloc -n1 hostname
salloc: Pending job allocation 83054458
salloc: job 83054458 queued and waiting for resources
salloc: job 83054458 has been allocated resources
salloc: Granted job allocation 83054458
salloc: Waiting for resource configuration
salloc: Nodes slepner012 are ready for job
amarel2.amarel.rutgers.edu
salloc: Relinquishing job allocation 83054458

You can, however, tell it to srun something in that shell instead:

[novosirj@amarel2 ~]$ salloc -n1 srun hostname
salloc: Pending job allocation 83054462
salloc: job 83054462 queued and waiting for resources
salloc: job 83054462 has been allocated resources
salloc: Granted job allocation 83054462
salloc: Waiting for resource configuration
salloc: Nodes node073 are ready for job
node073.perceval.rutgers.edu
salloc: Relinquishing job allocation 83054462

When you use salloc, it starts an allocation and sets up the environment:

[novosirj@amarel2 ~]$ env | grep SLURM
SLURM_NODELIST=slepner012
SLURM_JOB_NAME=bash
SLURM_NODE_ALIASES=(null)
SLURM_MEM_PER_CPU=4096
SLURM_NNODES=1
SLURM_JOBID=83053985
SLURM_NTASKS=1
SLURM_TASKS_PER_NODE=1
SLURM_JOB_ID=83053985
SLURM_SUBMIT_DIR=/cache/home/novosirj
SLURM_NPROCS=1
SLURM_JOB_NODELIST=slepner012
SLURM_CLUSTER_NAME=amarel
SLURM_JOB_CPUS_PER_NODE=1
SLURM_SUBMIT_HOST=amarel2.amarel.rutgers.edu
SLURM_JOB_PARTITION=main
SLURM_JOB_NUM_NODES=1

If you run “srun” subsequently, it will run on the compute node, but a regular 
command will run right where you are:

[novosirj@amarel2 ~]$ srun hostname
slepner012.amarel.rutgers.edu

[novosirj@amarel2 ~]$ hostname
amarel2.amarel.rutgers.edu

Again, I’d advise Mahmood to read the documentation that was already provided. 
It really doesn’t matter what behavior is requested — that’s not what this 
command does. If one wants to run a script on a compute node, the correct 
command is sbatch. I’m not sure what advantage salloc with srun has. I assume 
it’s so you can open an allocation and then occasionally send srun commands 
over to it.

--
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Jan 2, 2019, at 12:20 PM, Terry Jones <te...@jon.es> wrote:
> 
> I know very little about how SLURM works, but this sounds like it's a 
> configuration issue - that it hasn't been configured in a way that indicates 
> the login nodes cannot also be used as compute nodes. When I run salloc on 
> the cluster I use, I *always* get a shell on a compute node, never on the 
> login node that I ran salloc on.
> 
> Terry
> 
> 
> On Wed, Jan 2, 2019 at 4:56 PM Mahmood Naderan <mahmood...@gmail.com> wrote:
> Currently, users run "salloc --spankx11 ./qemu.sh" where qemu.sh is a script 
> to run a qemu-system-x86_64 command.
> When user (1) runs that command, the qemu is run on the login node since the 
> user is accessing the login node. When user (2) runs that command, his qemu 
> process is also running on the login node and so on.
> 
> That is not what I want!
> I expected slurm to dispatch the jobs on compute nodes.
> 
> 
> Regards,
> Mahmood
> 
> 
> 
> 
> On Wed, Jan 2, 2019 at 7:39 PM Renfro, Michael <ren...@tntech.edu> wrote:
> Not sure what the reasons behind “have to manually ssh to a node”, but salloc 
> and srun can be used to allocate resources and run commands on the allocated 
> resources:
> 
> Before allocation, regular commands run locally, and no Slurm-related 
> variables are present:
> 
> =====
> 
> [renfro@login ~]$ hostname
> login
> [renfro@login ~]$ echo $SLURM_TASKS_PER_NODE
> 
> 
> =====
> 
> After allocation, regular commands still run locally, Slurm-related variables 
> are present, and srun runs commands on the allocated node (my prompt change 
> inside a job is a local thing, not done by default):
> 
> =====
> 
> [renfro@login ~]$ salloc
> salloc: Granted job allocation 147867
> [renfro@login(job 147867) ~]$ hostname
> login
> [renfro@login(job 147867) ~]$ echo $SLURM_TASKS_PER_NODE
> 1
> [renfro@login(job 147867) ~]$ srun hostname
> node004
> [renfro@login(job 147867) ~]$ exit
> exit
> salloc: Relinquishing job allocation 147867
> [renfro@login ~]$
> 
> =====
> 
> Lots of people get interactive shells on a reserved node with some variant of 
> ‘srun --pty $SHELL -I’, which doesn’t require explicitly running salloc or 
> ssh, so what are you trying to accomplish in the end?
> 
> --
> Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
> 931 372-3601     / Tennessee Tech University
> 
> > On Jan 2, 2019, at 9:24 AM, Mahmood Naderan <mahmood...@gmail.com> wrote:
> >
> > I want to know if there any any way to push the node selection part on 
> > slurm and not a manual thing that is done by user.
> > Currently, I have to manually ssh to a node and try to "allocate resources" 
> > using salloc.
> >
> >
> > Regards,
> > Mahmood
>

signature.asc
Description: Message signed with OpenPGP

Re: [slurm-users] salloc with bash scripts problem

Reply via email to