[slurm-dev] Re: Tasks distribution

Sysadmin CAOS Mon, 02 Oct 2017 02:50:23 -0700

    OK, it works... but then, why is not working with distribution
   cyclic?


   My command line:

     sbatch --distribution=cyclic -N 5 -n 17 --ntasks-per-node=4
     --output=test.txt ./n10.sh


   My n10.sh script:

     #!/bin/bash
     #SBATCH --partition=nodo.q
     source /soft/modules-3.2.10/Modules/3.2.10/init/bash
     module load openmpi/1.8.1
     mpirun /home/caos/druiz/samples-SLURM/OpenMPI/1.8.1/mpihello


   My output file "test.txt":

     Process 0 on clus01.hpc.local out of 17
     Process 2 on clus01.hpc.local out of 17
     Process 3 on clus01.hpc.local out of 17
     Process 11 on clus04.hpc.local out of 17
     Process 12 on clus04.hpc.local out of 17
     Process 13 on clus04.hpc.local out of 17
     Process 16 on clus05.hpc.local out of 17
     Process 14 on clus05.hpc.local out of 17
     Process 8 on clus03.hpc.local out of 17
     Process 15 on clus05.hpc.local out of 17
     Process 9 on clus03.hpc.local out of 17
     Process 10 on clus03.hpc.local out of 17
     Process 1 on clus01.hpc.local out of 17
     Process 4 on clus02.hpc.local out of 17
     Process 5 on clus02.hpc.local out of 17
     Process 7 on clus02.hpc.local out of 17
     Process 6 on clus02.hpc.local out of 17


   My mpihello.c file (could be this file the problem?):

     #include <stdio.h>
     #include <mpi.h>

     int main(int argc, char *argv[]) {
     int numprocs, rank, namelen;
     char processor_name[MPI_MAX_PROCESSOR_NAME];

     MPI_Init(&argc, &argv);
     MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
     MPI_Get_processor_name(processor_name, &namelen);

     printf("Process %d on %s out of %d\n", rank, processor_name,
     numprocs);

     MPI_Finalize();
     }


   As you can see, first 4 tasks are executed in first node, not one in
   each four firsts nodes... so distribution=cyclic is not working as I
   hope :(

   I'm execution my MPI program with "mpirun"... Maybe could be this the
   problem? Do I need to execute with "srun"?

   Help please!

   Thanks!

   El 29/09/2017 a las 15:39, Jeffrey Frey escribió:

     From the sbatch man page (<!-- tmpl_var LEFT_BRACKET -->1<!-- tmpl_var 
RIGHT_BRACKET -->     https://slurm.schedmd.com/sbatch.html     ):
     
     
             "Block distribution is the default behavior if the number of tasks 
exceeds the number of allocated nodes."
     
             "Cyclic distribution is the default behavior if the number of 
tasks is no larger than the number of allocated nodes."
     
     
     Both methods yield the same number of processes on each node, but the 
assignment ordering differs.  To get the distribution you predicted, you'd want 
to use
     
     
             --distribution=plane=4
     
     
     
     


       On Sep 29, 2017, at 9:07 AM, sysadmin.caos <!-- tmpl_var LEFT_BRACKET 
-->2<!-- tmpl_var RIGHT_BRACKET -->       <[email protected]>        wrote:
       
       Hello,
       
       When I submit a MPI job in this way:
       sbatch -N 5 -n 17 --ntasks-per-node=4 --partition=nodo.q ./myscript.sh
       I "think" I'm requesting 5 nodes for executing 17 processes with 4 task 
per node...
       
       My "myscript.sh" is:
       #!/bin/bash
       source /soft/modules-3.2.10/Modules/3.2.10/init/bash
       module load openmpi/1.10.2
       mpirun ./mpihello
       
       I "supossed" that 17 processes would be allocated in this way:
       4 in the first node
       4 in the second node
       4 in the third node
       4 in the fourth node
       1 in the last node
       However, they are allocated in this other way:
       4 in the first node
       4 in the second node
       3 in the third node
       3 in the fourth node
       3 in the last node
       
       My slurmd.conf is:
       [...]
       SwitchType=switch/none
       TaskPlugin=task/none,task/affinity,task/cgroup
       DebugFlags=CPU_Bind,Gres
       
       # SCHEDULING
       FastSchedule=1
       SchedulerType=sched/backfill
       #SchedulerPort=7321
       SelectType=select/cons_res
       SelectTypeParameters=CR_Core
       
       # COMPUTE NODES
       NodeName=clus[01-12] CPUs=12 SocketsPerBoard=2 CoresPerSocket=6 
ThreadsPerCore=1 RealMemory=7806 TmpDisk=81880
       NodeName=clus-login CPUs=4 SocketsPerBoard=2 CoresperSocket=2 
ThreadsperCore=1 RealMemory=15886 TmpDisk=30705
       
       # PARTITIONS
       PartitionName=nodo.q Nodes=clus[01-12] Default=YES MaxTime=8:00:00 
State=UP AllocNodes=clus-login MaxCPUsPerNode=12
       PartitionName=test.q Nodes=clus-login MaxTime=10:00 State=UP 
AllocNodes=clus-login MaxCPUsPerNode=12
       [...]
       
       Why are tasks executing in this other way? What is wrong in my SLURM 
configuration?
       
       Thanks.

     
     ::::::::::::::::::::::::::::::::::::::::::::::::::::::
     Jeffrey T. Frey, Ph.D.
     Systems Programmer V / HPC Management
     Network & Systems Services / College of Engineering
     University of Delaware, Newark DE  19716
     Office: (302) 831-6034  Mobile: (302) 419-4976
     ::::::::::::::::::::::::::::::::::::::::::::::::::::::
     
     
   

   


   <!-- tmpl_var LEFT_BRACKET -->1<!-- tmpl_var RIGHT_BRACKET --> 
https://slurm.schedmd.com/sbatch.html
   <!-- tmpl_var LEFT_BRACKET -->2<!-- tmpl_var RIGHT_BRACKET --> 
mailto:[email protected]

[slurm-dev] Re: Tasks distribution

Reply via email to