Re: [slurm-users] salloc with bash scripts problem

Brian Johanson Wed, 02 Jan 2019 10:43:08 -0800

sallocdefaultcommand specified in slurm.conf will change the defaultbehavior when salloc is executed without appending a command and alsoexplain conflicting behavior between installations.


       SallocDefaultCommand

Normally, salloc(1) will run the user's default shellwhen a command to execute is not specified on the salloc command line. If SallocDefaultCommand is specified, salloc will instead run the configured command. The command is passed to'/bin/sh -c', so shell metacharacters are allowed, and commands withmultiple arguments should be quoted. For instance:


                  SallocDefaultCommand = "$SHELL"

would run the shell in the user's $SHELL environmentvariable. and

SallocDefaultCommand = "srun -n1 -N1 --mem-per-cpu=0--pty --preserve-env --mpi=none $SHELL"

would run spawn the user's default shell on theallocated resources, but not consume any of the CPU or memory resources,configure it as a pseudo-terminal, and preserve all of the job's environment variables (i.e. and not over-write them withthe job step's allocation information).

For systems with generic resources (GRES) defined, theSallocDefaultCommand value should explicitly specify a zero count forthe configured GRES. Failure to do so will result in the launched shell consuming those GRES and preventingsubsequent srun commands from using them. For example, on Cray systemsadd "--gres=craynetwork:0" as shown below: SallocDefaultCommand = "srun -n1 -N1 --mem-per-cpu=0--gres=craynetwork:0 --pty --preserve-env --mpi=none $SHELL"

For systems with TaskPlugin set, adding an option of"--cpu-bind=no" is recommended if the default shell should have accessto all of the CPUs allocated to the job on that node, other‐

              wise the shell may be limited to a single cpu or core.

On 1/2/2019 12:38 PM, Ryan Novosielski wrote:

I don’t think that’s true (and others have shared documentation regarding 
interactive jobs and the S commands). There was documentation shared for how 
this works, and it seems as if it has been ignored.

[novosirj@amarel2 ~]$ salloc -n1
salloc: Pending job allocation 83053985
salloc: job 83053985 queued and waiting for resources
salloc: job 83053985 has been allocated resources
salloc: Granted job allocation 83053985
salloc: Waiting for resource configuration
salloc: Nodes slepner012 are ready for job

This is the behavior I’ve always seen. If I include a command at the end of the 
line, it appears to simply run it in the “new” shell that is created by salloc 
(which you’ll notice you can exit via CTRL-D or exit).

[novosirj@amarel2 ~]$ salloc -n1 hostname
salloc: Pending job allocation 83054458
salloc: job 83054458 queued and waiting for resources
salloc: job 83054458 has been allocated resources
salloc: Granted job allocation 83054458
salloc: Waiting for resource configuration
salloc: Nodes slepner012 are ready for job
amarel2.amarel.rutgers.edu
salloc: Relinquishing job allocation 83054458

You can, however, tell it to srun something in that shell instead:

[novosirj@amarel2 ~]$ salloc -n1 srun hostname
salloc: Pending job allocation 83054462
salloc: job 83054462 queued and waiting for resources
salloc: job 83054462 has been allocated resources
salloc: Granted job allocation 83054462
salloc: Waiting for resource configuration
salloc: Nodes node073 are ready for job
node073.perceval.rutgers.edu
salloc: Relinquishing job allocation 83054462

When you use salloc, it starts an allocation and sets up the environment:

[novosirj@amarel2 ~]$ env | grep SLURM
SLURM_NODELIST=slepner012
SLURM_JOB_NAME=bash
SLURM_NODE_ALIASES=(null)
SLURM_MEM_PER_CPU=4096
SLURM_NNODES=1
SLURM_JOBID=83053985
SLURM_NTASKS=1
SLURM_TASKS_PER_NODE=1
SLURM_JOB_ID=83053985
SLURM_SUBMIT_DIR=/cache/home/novosirj
SLURM_NPROCS=1
SLURM_JOB_NODELIST=slepner012
SLURM_CLUSTER_NAME=amarel
SLURM_JOB_CPUS_PER_NODE=1
SLURM_SUBMIT_HOST=amarel2.amarel.rutgers.edu
SLURM_JOB_PARTITION=main
SLURM_JOB_NUM_NODES=1

If you run “srun” subsequently, it will run on the compute node, but a regular 
command will run right where you are:

[novosirj@amarel2 ~]$ srun hostname
slepner012.amarel.rutgers.edu

[novosirj@amarel2 ~]$ hostname
amarel2.amarel.rutgers.edu

Again, I’d advise Mahmood to read the documentation that was already provided. 
It really doesn’t matter what behavior is requested — that’s not what this 
command does. If one wants to run a script on a compute node, the correct 
command is sbatch. I’m not sure what advantage salloc with srun has. I assume 
it’s so you can open an allocation and then occasionally send srun commands 
over to it.

--
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
      `'

On Jan 2, 2019, at 12:20 PM, Terry Jones <te...@jon.es> wrote:

I know very little about how SLURM works, but this sounds like it's a 
configuration issue - that it hasn't been configured in a way that indicates 
the login nodes cannot also be used as compute nodes. When I run salloc on the 
cluster I use, I *always* get a shell on a compute node, never on the login 
node that I ran salloc on.

Terry


On Wed, Jan 2, 2019 at 4:56 PM Mahmood Naderan <mahmood...@gmail.com> wrote:
Currently, users run "salloc --spankx11 ./qemu.sh" where qemu.sh is a script to 
run a qemu-system-x86_64 command.
When user (1) runs that command, the qemu is run on the login node since the 
user is accessing the login node. When user (2) runs that command, his qemu 
process is also running on the login node and so on.

That is not what I want!
I expected slurm to dispatch the jobs on compute nodes.


Regards,
Mahmood




On Wed, Jan 2, 2019 at 7:39 PM Renfro, Michael <ren...@tntech.edu> wrote:
Not sure what the reasons behind “have to manually ssh to a node”, but salloc 
and srun can be used to allocate resources and run commands on the allocated 
resources:

Before allocation, regular commands run locally, and no Slurm-related variables 
are present:

=====

[renfro@login ~]$ hostname
login
[renfro@login ~]$ echo $SLURM_TASKS_PER_NODE


=====

After allocation, regular commands still run locally, Slurm-related variables 
are present, and srun runs commands on the allocated node (my prompt change 
inside a job is a local thing, not done by default):

=====

[renfro@login ~]$ salloc
salloc: Granted job allocation 147867
[renfro@login(job 147867) ~]$ hostname
login
[renfro@login(job 147867) ~]$ echo $SLURM_TASKS_PER_NODE
1
[renfro@login(job 147867) ~]$ srun hostname
node004
[renfro@login(job 147867) ~]$ exit
exit
salloc: Relinquishing job allocation 147867
[renfro@login ~]$

=====

Lots of people get interactive shells on a reserved node with some variant of 
‘srun --pty $SHELL -I’, which doesn’t require explicitly running salloc or ssh, 
so what are you trying to accomplish in the end?

--
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
931 372-3601     / Tennessee Tech University

On Jan 2, 2019, at 9:24 AM, Mahmood Naderan <mahmood...@gmail.com> wrote:

I want to know if there any any way to push the node selection part on slurm 
and not a manual thing that is done by user.
Currently, I have to manually ssh to a node and try to "allocate resources" 
using salloc.


Regards,
Mahmood

Re: [slurm-users] salloc with bash scripts problem

Reply via email to