OK, it works... but then, why is not working with distribution
cyclic?
My command line:
sbatch --distribution=cyclic -N 5 -n 17 --ntasks-per-node=4
--output=test.txt ./n10.sh
My n10.sh script:
#!/bin/bash
#SBATCH --partition=nodo.q
source /soft/modules-3.2.10/Modules/3.2.10/init/bash
module load openmpi/1.8.1
mpirun /home/caos/druiz/samples-SLURM/OpenMPI/1.8.1/mpihello
My output file "test.txt":
Process 0 on clus01.hpc.local out of 17
Process 2 on clus01.hpc.local out of 17
Process 3 on clus01.hpc.local out of 17
Process 11 on clus04.hpc.local out of 17
Process 12 on clus04.hpc.local out of 17
Process 13 on clus04.hpc.local out of 17
Process 16 on clus05.hpc.local out of 17
Process 14 on clus05.hpc.local out of 17
Process 8 on clus03.hpc.local out of 17
Process 15 on clus05.hpc.local out of 17
Process 9 on clus03.hpc.local out of 17
Process 10 on clus03.hpc.local out of 17
Process 1 on clus01.hpc.local out of 17
Process 4 on clus02.hpc.local out of 17
Process 5 on clus02.hpc.local out of 17
Process 7 on clus02.hpc.local out of 17
Process 6 on clus02.hpc.local out of 17
My mpihello.c file (could be this file the problem?):
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name,
numprocs);
MPI_Finalize();
}
As you can see, first 4 tasks are executed in first node, not one in
each four firsts nodes... so distribution=cyclic is not working as I
hope :(
I'm execution my MPI program with "mpirun"... Maybe could be this the
problem? Do I need to execute with "srun"?
Help please!
Thanks!
El 29/09/2017 a las 15:39, Jeffrey Frey escribió:
From the sbatch man page (<!-- tmpl_var LEFT_BRACKET -->1<!-- tmpl_var
RIGHT_BRACKET --> https://slurm.schedmd.com/sbatch.html ):
"Block distribution is the default behavior if the number of tasks
exceeds the number of allocated nodes."
"Cyclic distribution is the default behavior if the number of
tasks is no larger than the number of allocated nodes."
Both methods yield the same number of processes on each node, but the
assignment ordering differs. To get the distribution you predicted, you'd want
to use
--distribution=plane=4
On Sep 29, 2017, at 9:07 AM, sysadmin.caos <!-- tmpl_var LEFT_BRACKET
-->2<!-- tmpl_var RIGHT_BRACKET --> <[email protected]> wrote:
Hello,
When I submit a MPI job in this way:
sbatch -N 5 -n 17 --ntasks-per-node=4 --partition=nodo.q ./myscript.sh
I "think" I'm requesting 5 nodes for executing 17 processes with 4 task
per node...
My "myscript.sh" is:
#!/bin/bash
source /soft/modules-3.2.10/Modules/3.2.10/init/bash
module load openmpi/1.10.2
mpirun ./mpihello
I "supossed" that 17 processes would be allocated in this way:
4 in the first node
4 in the second node
4 in the third node
4 in the fourth node
1 in the last node
However, they are allocated in this other way:
4 in the first node
4 in the second node
3 in the third node
3 in the fourth node
3 in the last node
My slurmd.conf is:
[...]
SwitchType=switch/none
TaskPlugin=task/none,task/affinity,task/cgroup
DebugFlags=CPU_Bind,Gres
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
#SchedulerPort=7321
SelectType=select/cons_res
SelectTypeParameters=CR_Core
# COMPUTE NODES
NodeName=clus[01-12] CPUs=12 SocketsPerBoard=2 CoresPerSocket=6
ThreadsPerCore=1 RealMemory=7806 TmpDisk=81880
NodeName=clus-login CPUs=4 SocketsPerBoard=2 CoresperSocket=2
ThreadsperCore=1 RealMemory=15886 TmpDisk=30705
# PARTITIONS
PartitionName=nodo.q Nodes=clus[01-12] Default=YES MaxTime=8:00:00
State=UP AllocNodes=clus-login MaxCPUsPerNode=12
PartitionName=test.q Nodes=clus-login MaxTime=10:00 State=UP
AllocNodes=clus-login MaxCPUsPerNode=12
[...]
Why are tasks executing in this other way? What is wrong in my SLURM
configuration?
Thanks.
::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE 19716
Office: (302) 831-6034 Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::
<!-- tmpl_var LEFT_BRACKET -->1<!-- tmpl_var RIGHT_BRACKET -->
https://slurm.schedmd.com/sbatch.html
<!-- tmpl_var LEFT_BRACKET -->2<!-- tmpl_var RIGHT_BRACKET -->
mailto:[email protected]