sallocdefaultcommand specified in slurm.conf will change the default
behavior when salloc is executed without appending a command and also
explain conflicting behavior between installations.
SallocDefaultCommand
Normally, salloc(1) will run the user's default shell
when a command to execute is not specified on the salloc command line.
If SallocDefaultCommand is specified, salloc will instead
run the configured command. The command is passed to
'/bin/sh -c', so shell metacharacters are allowed, and commands with
multiple arguments should be quoted. For instance:
SallocDefaultCommand = "$SHELL"
would run the shell in the user's $SHELL environment
variable. and
SallocDefaultCommand = "srun -n1 -N1 --mem-per-cpu=0
--pty --preserve-env --mpi=none $SHELL"
would run spawn the user's default shell on the
allocated resources, but not consume any of the CPU or memory resources,
configure it as a pseudo-terminal, and preserve all of the job's
environment variables (i.e. and not over-write them with
the job step's allocation information).
For systems with generic resources (GRES) defined, the
SallocDefaultCommand value should explicitly specify a zero count for
the configured GRES. Failure to do so will result in the
launched shell consuming those GRES and preventing
subsequent srun commands from using them. For example, on Cray systems
add "--gres=craynetwork:0" as shown below:
SallocDefaultCommand = "srun -n1 -N1 --mem-per-cpu=0
--gres=craynetwork:0 --pty --preserve-env --mpi=none $SHELL"
For systems with TaskPlugin set, adding an option of
"--cpu-bind=no" is recommended if the default shell should have access
to all of the CPUs allocated to the job on that node, other‐
wise the shell may be limited to a single cpu or core.
On 1/2/2019 12:38 PM, Ryan Novosielski wrote:
I don’t think that’s true (and others have shared documentation regarding
interactive jobs and the S commands). There was documentation shared for how
this works, and it seems as if it has been ignored.
[novosirj@amarel2 ~]$ salloc -n1
salloc: Pending job allocation 83053985
salloc: job 83053985 queued and waiting for resources
salloc: job 83053985 has been allocated resources
salloc: Granted job allocation 83053985
salloc: Waiting for resource configuration
salloc: Nodes slepner012 are ready for job
This is the behavior I’ve always seen. If I include a command at the end of the
line, it appears to simply run it in the “new” shell that is created by salloc
(which you’ll notice you can exit via CTRL-D or exit).
[novosirj@amarel2 ~]$ salloc -n1 hostname
salloc: Pending job allocation 83054458
salloc: job 83054458 queued and waiting for resources
salloc: job 83054458 has been allocated resources
salloc: Granted job allocation 83054458
salloc: Waiting for resource configuration
salloc: Nodes slepner012 are ready for job
amarel2.amarel.rutgers.edu
salloc: Relinquishing job allocation 83054458
You can, however, tell it to srun something in that shell instead:
[novosirj@amarel2 ~]$ salloc -n1 srun hostname
salloc: Pending job allocation 83054462
salloc: job 83054462 queued and waiting for resources
salloc: job 83054462 has been allocated resources
salloc: Granted job allocation 83054462
salloc: Waiting for resource configuration
salloc: Nodes node073 are ready for job
node073.perceval.rutgers.edu
salloc: Relinquishing job allocation 83054462
When you use salloc, it starts an allocation and sets up the environment:
[novosirj@amarel2 ~]$ env | grep SLURM
SLURM_NODELIST=slepner012
SLURM_JOB_NAME=bash
SLURM_NODE_ALIASES=(null)
SLURM_MEM_PER_CPU=4096
SLURM_NNODES=1
SLURM_JOBID=83053985
SLURM_NTASKS=1
SLURM_TASKS_PER_NODE=1
SLURM_JOB_ID=83053985
SLURM_SUBMIT_DIR=/cache/home/novosirj
SLURM_NPROCS=1
SLURM_JOB_NODELIST=slepner012
SLURM_CLUSTER_NAME=amarel
SLURM_JOB_CPUS_PER_NODE=1
SLURM_SUBMIT_HOST=amarel2.amarel.rutgers.edu
SLURM_JOB_PARTITION=main
SLURM_JOB_NUM_NODES=1
If you run “srun” subsequently, it will run on the compute node, but a regular
command will run right where you are:
[novosirj@amarel2 ~]$ srun hostname
slepner012.amarel.rutgers.edu
[novosirj@amarel2 ~]$ hostname
amarel2.amarel.rutgers.edu
Again, I’d advise Mahmood to read the documentation that was already provided.
It really doesn’t matter what behavior is requested — that’s not what this
command does. If one wants to run a script on a compute node, the correct
command is sbatch. I’m not sure what advantage salloc with srun has. I assume
it’s so you can open an allocation and then occasionally send srun commands
over to it.
--
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
`'
On Jan 2, 2019, at 12:20 PM, Terry Jones <te...@jon.es> wrote:
I know very little about how SLURM works, but this sounds like it's a
configuration issue - that it hasn't been configured in a way that indicates
the login nodes cannot also be used as compute nodes. When I run salloc on the
cluster I use, I *always* get a shell on a compute node, never on the login
node that I ran salloc on.
Terry
On Wed, Jan 2, 2019 at 4:56 PM Mahmood Naderan <mahmood...@gmail.com> wrote:
Currently, users run "salloc --spankx11 ./qemu.sh" where qemu.sh is a script to
run a qemu-system-x86_64 command.
When user (1) runs that command, the qemu is run on the login node since the
user is accessing the login node. When user (2) runs that command, his qemu
process is also running on the login node and so on.
That is not what I want!
I expected slurm to dispatch the jobs on compute nodes.
Regards,
Mahmood
On Wed, Jan 2, 2019 at 7:39 PM Renfro, Michael <ren...@tntech.edu> wrote:
Not sure what the reasons behind “have to manually ssh to a node”, but salloc
and srun can be used to allocate resources and run commands on the allocated
resources:
Before allocation, regular commands run locally, and no Slurm-related variables
are present:
=====
[renfro@login ~]$ hostname
login
[renfro@login ~]$ echo $SLURM_TASKS_PER_NODE
=====
After allocation, regular commands still run locally, Slurm-related variables
are present, and srun runs commands on the allocated node (my prompt change
inside a job is a local thing, not done by default):
=====
[renfro@login ~]$ salloc
salloc: Granted job allocation 147867
[renfro@login(job 147867) ~]$ hostname
login
[renfro@login(job 147867) ~]$ echo $SLURM_TASKS_PER_NODE
1
[renfro@login(job 147867) ~]$ srun hostname
node004
[renfro@login(job 147867) ~]$ exit
exit
salloc: Relinquishing job allocation 147867
[renfro@login ~]$
=====
Lots of people get interactive shells on a reserved node with some variant of
‘srun --pty $SHELL -I’, which doesn’t require explicitly running salloc or ssh,
so what are you trying to accomplish in the end?
--
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
931 372-3601 / Tennessee Tech University
On Jan 2, 2019, at 9:24 AM, Mahmood Naderan <mahmood...@gmail.com> wrote:
I want to know if there any any way to push the node selection part on slurm
and not a manual thing that is done by user.
Currently, I have to manually ssh to a node and try to "allocate resources"
using salloc.
Regards,
Mahmood