Thank you both for your suggestion. I still cannot make this work though, and I think - as Ralph predicted - most problems are likely related to non-homogeneous mapping of cpus to jobs. But there is problems even before that part..

If I reserve one entire compute node with SLURM:

salloc --ntasks=16 --tasks-per-node=16

I can run my code as you suggested with _any_ N (including odd numbers!). OpenMPI will figure out the maximun number of tasks that fits and launch them. This also works for many complete nodes, but this is the only case when I managed to get it to work.

If I specify cpus per task, also allocating one full node

salloc --ntasks=4 --cpus-per-task=4 --tasks-per-node=4

things go astray:

mpirun --map-by slot:pe=4 ./affinity
rank 0 @ compute-1-6.local  0, 1, 2, 3, 16, 17, 18, 19,

Yes, only one MPI process was started. Running what Gilles previously suggested:

$ srun grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list:    0-31
Cpus_allowed_list:    0-31
Cpus_allowed_list:    0-31
Cpus_allowed_list:    0-31

So the allocation seems fine. The SLURM environment is also correct, as far as I can tell:

SLURM_CPUS_PER_TASK=4
SLURM_JOB_CPUS_PER_NODE=16
SLURM_JOB_NODELIST=c1-6
SLURM_JOB_NUM_NODES=1
SLURM_NNODES=1
SLURM_NODELIST=c1-6
SLURM_NPROCS=4
SLURM_NTASKS=4
SLURM_NTASKS_PER_NODE=4
SLURM_TASKS_PER_NODE=4

I do not understand why openmpi does not want to start more than 1 process. If I try to force it (-n 4) I of course get an error:

mpirun --map-by slot:pe=4 -n 4 ./affinity

--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4 slots
that were requested by the application:
  ./affinity

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------


For clarity, I will not describe other cases / non-contiguous cpu sets / heterogeneous nodes. Clearly something is wrong already with the simple ones..

Does anyone have any ideas? Should I record some logs to see what's going on?

Thanks a lot!

Marcin






On 10/06/2015 01:04 AM, tmish...@jcity.maeda.co.jp wrote:
Hi Ralph, it's been a long time.

The option "map-by core" does not work when pe=N > 1 is specified.
So, you should use "map-by slot:pe=N" as far as I remember.

Regards,
Tetsuya Mishima

2015/10/06 5:40:33、"users"さんは「Re: [OMPI users] Hybrid OpenMPI+OpenMP
tasks using SLURM」で書きました
Hmmm…okay, try -map-by socket:pe=4

We’ll still hit the asymmetric topology issue, but otherwise this should
work

On Oct 5, 2015, at 1:25 PM, marcin.krotkiewski
<marcin.krotkiew...@gmail.com> wrote:
Ralph,

Thank you for a fast response! Sounds very good, unfortunately I get an
error:
$ mpirun --map-by core:pe=4 ./affinity

--------------------------------------------------------------------------
A request for multiple cpus-per-proc was given, but a directive
was also give to map to an object level that cannot support that
directive.

Please specify a mapping level that has more than one cpu, or
else let us define a default mapping that will allow multiple
cpus-per-proc.

--------------------------------------------------------------------------
I have allocated my slurm job as

salloc --ntasks=2 --cpus-per-task=4

I have checked in 1.10.0 and 1.10.1rc1.




On 10/05/2015 09:58 PM, Ralph Castain wrote:
You would presently do:

mpirun —map-by core:pe=4

to get what you are seeking. If we don’t already set that qualifier
when we see “cpus_per_task”, then we probably should do so as there isn’t
any reason to make you set it twice (well, other than
trying to track which envar slurm is using now).

On Oct 5, 2015, at 12:38 PM, marcin.krotkiewski
<marcin.krotkiew...@gmail.com> wrote:
Yet another question about cpu binding under SLURM environment..

Short version: will OpenMPI support SLURM_CPUS_PER_TASK for the
purpose of cpu binding?

Full version: When you allocate a job like, e.g., this

salloc --ntasks=2 --cpus-per-task=4

SLURM will allocate 8 cores in total, 4 for each 'assumed' MPI tasks.
This is useful for hybrid jobs, where each MPI process spawns some internal
worker threads (e.g., OpenMP). The intention is
that there are 2 MPI procs started, each of them 'bound' to 4 cores.
SLURM will also set an environment variable
SLURM_CPUS_PER_TASK=4

which should (probably?) be taken into account by the method that
launches the MPI processes to figure out the cpuset. In case of OpenMPI +
mpirun I think something should happen in
orte/mca/ras/slurm/ras_slurm_module.c, where the variable _is_ actually
parsed. Unfortunately, it is never really used...
As a result, cpuset of all tasks started on a given compute node
includes all CPU cores of all MPI tasks on that node, just as provided by
SLURM (in the above example - 8). In general, there is
no simple way for the user code in the MPI procs to 'split' the cores
between themselves. I imagine the original intention to support this in
OpenMPI was something like
mpirun --bind-to subtask_cpuset

with an artificial bind target that would cause OpenMPI to divide the
allocated cores between the mpi tasks. Is this right? If so, it seems that
at this point this is not implemented. Is there
plans to do this? If no, does anyone know another way to achieve that?
Thanks a lot!

Marcin



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27803.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27804.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27805.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/usersLink to
this post: http://www.open-mpi.org/community/lists/users/2015/10/27806.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/10/27809.php

Reply via email to