Hi Marcin Looking again at this: could you get a similar reservation again and rerun mpirun with “-display-allocation” added to the command line? I’d like to see if we are correctly parsing the number of slots assigned in the allocation
Ralph > On Oct 6, 2015, at 11:52 AM, marcin.krotkiewski > <marcin.krotkiew...@gmail.com> wrote: > > Thank you both for your suggestion. I still cannot make this work though, and > I think - as Ralph predicted - most problems are likely related to > non-homogeneous mapping of cpus to jobs. But there is problems even before > that part.. > > If I reserve one entire compute node with SLURM: > > salloc --ntasks=16 --tasks-per-node=16 > > I can run my code as you suggested with _any_ N (including odd numbers!). > OpenMPI will figure out the maximun number of tasks that fits and launch > them. This also works for many complete nodes, but this is the only case when > I managed to get it to work. > > If I specify cpus per task, also allocating one full node > > salloc --ntasks=4 --cpus-per-task=4 --tasks-per-node=4 > > things go astray: > > mpirun --map-by slot:pe=4 ./affinity > rank 0 @ compute-1-6.local 0, 1, 2, 3, 16, 17, 18, 19, > > Yes, only one MPI process was started. Running what Gilles previously > suggested: > > $ srun grep Cpus_allowed_list /proc/self/status > Cpus_allowed_list: 0-31 > Cpus_allowed_list: 0-31 > Cpus_allowed_list: 0-31 > Cpus_allowed_list: 0-31 > > So the allocation seems fine. The SLURM environment is also correct, as far > as I can tell: > > SLURM_CPUS_PER_TASK=4 > SLURM_JOB_CPUS_PER_NODE=16 > SLURM_JOB_NODELIST=c1-6 > SLURM_JOB_NUM_NODES=1 > SLURM_NNODES=1 > SLURM_NODELIST=c1-6 > SLURM_NPROCS=4 > SLURM_NTASKS=4 > SLURM_NTASKS_PER_NODE=4 > SLURM_TASKS_PER_NODE=4 > > I do not understand why openmpi does not want to start more than 1 process. > If I try to force it (-n 4) I of course get an error: > > mpirun --map-by slot:pe=4 -n 4 ./affinity > > -------------------------------------------------------------------------- > There are not enough slots available in the system to satisfy the 4 slots > that were requested by the application: > ./affinity > > Either request fewer slots for your application, or make more slots available > for use. > -------------------------------------------------------------------------- > > > For clarity, I will not describe other cases / non-contiguous cpu sets / > heterogeneous nodes. Clearly something is wrong already with the simple ones.. > > Does anyone have any ideas? Should I record some logs to see what's going on? > > Thanks a lot! > > Marcin > > > > > > > On 10/06/2015 01:04 AM, tmish...@jcity.maeda.co.jp wrote: >> Hi Ralph, it's been a long time. >> >> The option "map-by core" does not work when pe=N > 1 is specified. >> So, you should use "map-by slot:pe=N" as far as I remember. >> >> Regards, >> Tetsuya Mishima >> >> 2015/10/06 5:40:33、"users"さんは「Re: [OMPI users] Hybrid OpenMPI+OpenMP >> tasks using SLURM」で書きました >>> Hmmm…okay, try -map-by socket:pe=4 >>> >>> We’ll still hit the asymmetric topology issue, but otherwise this should >> work >>> >>>> On Oct 5, 2015, at 1:25 PM, marcin.krotkiewski >> <marcin.krotkiew...@gmail.com> wrote: >>>> Ralph, >>>> >>>> Thank you for a fast response! Sounds very good, unfortunately I get an >> error: >>>> $ mpirun --map-by core:pe=4 ./affinity >>>> >> -------------------------------------------------------------------------- >>>> A request for multiple cpus-per-proc was given, but a directive >>>> was also give to map to an object level that cannot support that >>>> directive. >>>> >>>> Please specify a mapping level that has more than one cpu, or >>>> else let us define a default mapping that will allow multiple >>>> cpus-per-proc. >>>> >> -------------------------------------------------------------------------- >>>> I have allocated my slurm job as >>>> >>>> salloc --ntasks=2 --cpus-per-task=4 >>>> >>>> I have checked in 1.10.0 and 1.10.1rc1. >>>> >>>> >>>> >>>> >>>> On 10/05/2015 09:58 PM, Ralph Castain wrote: >>>>> You would presently do: >>>>> >>>>> mpirun —map-by core:pe=4 >>>>> >>>>> to get what you are seeking. If we don’t already set that qualifier >> when we see “cpus_per_task”, then we probably should do so as there isn’t >> any reason to make you set it twice (well, other than >>> trying to track which envar slurm is using now). >>>>> >>>>>> On Oct 5, 2015, at 12:38 PM, marcin.krotkiewski >> <marcin.krotkiew...@gmail.com> wrote: >>>>>> Yet another question about cpu binding under SLURM environment.. >>>>>> >>>>>> Short version: will OpenMPI support SLURM_CPUS_PER_TASK for the >> purpose of cpu binding? >>>>>> >>>>>> Full version: When you allocate a job like, e.g., this >>>>>> >>>>>> salloc --ntasks=2 --cpus-per-task=4 >>>>>> >>>>>> SLURM will allocate 8 cores in total, 4 for each 'assumed' MPI tasks. >> This is useful for hybrid jobs, where each MPI process spawns some internal >> worker threads (e.g., OpenMP). The intention is >>> that there are 2 MPI procs started, each of them 'bound' to 4 cores. >> SLURM will also set an environment variable >>>>>> SLURM_CPUS_PER_TASK=4 >>>>>> >>>>>> which should (probably?) be taken into account by the method that >> launches the MPI processes to figure out the cpuset. In case of OpenMPI + >> mpirun I think something should happen in >>> orte/mca/ras/slurm/ras_slurm_module.c, where the variable _is_ actually >> parsed. Unfortunately, it is never really used... >>>>>> As a result, cpuset of all tasks started on a given compute node >> includes all CPU cores of all MPI tasks on that node, just as provided by >> SLURM (in the above example - 8). In general, there is >>> no simple way for the user code in the MPI procs to 'split' the cores >> between themselves. I imagine the original intention to support this in >> OpenMPI was something like >>>>>> mpirun --bind-to subtask_cpuset >>>>>> >>>>>> with an artificial bind target that would cause OpenMPI to divide the >> allocated cores between the mpi tasks. Is this right? If so, it seems that >> at this point this is not implemented. Is there >>> plans to do this? If no, does anyone know another way to achieve that? >>>>>> Thanks a lot! >>>>>> >>>>>> Marcin >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/10/27803.php >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/10/27804.php >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/10/27805.php >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/usersLink to >> this post: http://www.open-mpi.org/community/lists/users/2015/10/27806.php >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/10/27809.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/10/27817.php