Marcin,

could you give a try at v1.10.1rc1 that was released today ?
it fixes a bug when hwloc was trying to bind outside the cpuset.

Ralph and all,

imho, there are several issues here
- if slurm allocates threads instead of core, then the --oversubscribe
mpirun option could be mandatory
- with --oversubscribe --hetero-nodes, mpirun should not fail, and if it
still fails with v1.10.1rc1, I will ask some more details in order to fix
ompi

Cheers,

Gilles

On Saturday, October 3, 2015, Ralph Castain <r...@open-mpi.org> wrote:

> Thanks Marcin. Looking at this, I’m guessing that Slurm may be treating
> HTs as “cores” - i.e., as independent cpus. Any chance that is true?
>
> I’m wondering because bind-to core will attempt to bind your proc to both
> HTs on the core. For some reason, we thought that 8.24 were HTs on the same
> core, which is why we tried to bind to that pair of HTs. We got an error
> because HT #24 was not allocated to us on node c6, but HT #8 was.
>
>
> > On Oct 3, 2015, at 2:43 AM, marcin.krotkiewski <
> marcin.krotkiew...@gmail.com <javascript:;>> wrote:
> >
> > Hi, Ralph,
> >
> > I submit my slurm job as follows
> >
> > salloc --ntasks=64 --mem-per-cpu=2G --time=1:0:0
> >
> > Effectively, the allocated CPU cores are spread amount many cluster
> nodes. SLURM uses cgroups to limit the CPU cores available for mpi
> processes running on a given cluster node. Compute nodes are 2-socket,
> 8-core E5-2670 systems with HyperThreading on
> >
> > node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> > node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
> > node distances:
> > node   0   1
> >  0:  10  21
> >  1:  21  10
> >
> > I run MPI program with command
> >
> > mpirun  --report-bindings --bind-to core -np 64 ./affinity
> >
> > The program simply runs sched_getaffinity for each process and prints
> out the result.
> >
> > -----------
> > TEST RUN 1
> > -----------
> > For this particular job the problem is more severe: openmpi fails to run
> at all with error
> >
> >
> --------------------------------------------------------------------------
> > Open MPI tried to bind a new process, but something went wrong.  The
> > process was killed without launching the target application.  Your job
> > will now abort.
> >
> >  Local host:        c6-6
> >  Application name:  ./affinity
> >  Error message:     hwloc_set_cpubind returned "Error" for bitmap "8,24"
> >  Location:          odls_default_module.c:551
> >
> --------------------------------------------------------------------------
> >
> > This is SLURM environment variables:
> >
> > SLURM_JOBID=12712225
> >
> SLURM_JOB_CPUS_PER_NODE='3(x2),2,1(x3),2(x2),1,3(x3),5,1,4,1,3,2,3,7,1,5,6,1'
> > SLURM_JOB_ID=12712225
> >
> SLURM_JOB_NODELIST='c6-[3,6-8,12,14,17,22-23],c8-[4,7,9,17,20,28],c15-[5,10,18,20,22-24,28],c16-11'
> > SLURM_JOB_NUM_NODES=24
> > SLURM_JOB_PARTITION=normal
> > SLURM_MEM_PER_CPU=2048
> > SLURM_NNODES=24
> >
> SLURM_NODELIST='c6-[3,6-8,12,14,17,22-23],c8-[4,7,9,17,20,28],c15-[5,10,18,20,22-24,28],c16-11'
> > SLURM_NODE_ALIASES='(null)'
> > SLURM_NPROCS=64
> > SLURM_NTASKS=64
> > SLURM_SUBMIT_DIR=/cluster/home/marcink
> > SLURM_SUBMIT_HOST=login-0-2.local
> >
> SLURM_TASKS_PER_NODE='3(x2),2,1(x3),2(x2),1,3(x3),5,1,4,1,3,2,3,7,1,5,6,1'
> >
> > There is also a lot of warnings like
> >
> > [compute-6-6.local:20158] MCW rank 4 is not bound (or bound to all
> available processors)
> >
> >
> > -----------
> > TEST RUN 2
> > -----------
> >
> > In another allocation I got a different error
> >
> >
> --------------------------------------------------------------------------
> > A request was made to bind to that would result in binding more
> > processes than cpus on a resource:
> >
> >   Bind to:     CORE
> >   Node:        c6-19
> >   #processes:  2
> >   #cpus:       1
> >
> > You can override this protection by adding the "overload-allowed"
> > option to your binding directive.
> >
> --------------------------------------------------------------------------
> >
> > and the allocation was the following
> >
> > SLURM_JOBID=12712250
> > SLURM_JOB_CPUS_PER_NODE='3(x2),2,1,15,1,3,16,2,1,3(x2),2,5,4'
> > SLURM_JOB_ID=12712250
> > SLURM_JOB_NODELIST='c6-[3,6-8,12,14,17,19,22-23],c8-[4,7,9,17,28]'
> > SLURM_JOB_NUM_NODES=15
> > SLURM_JOB_PARTITION=normal
> > SLURM_MEM_PER_CPU=2048
> > SLURM_NNODES=15
> > SLURM_NODELIST='c6-[3,6-8,12,14,17,19,22-23],c8-[4,7,9,17,28]'
> > SLURM_NODE_ALIASES='(null)'
> > SLURM_NPROCS=64
> > SLURM_NTASKS=64
> > SLURM_SUBMIT_DIR=/cluster/home/marcink
> > SLURM_SUBMIT_HOST=login-0-2.local
> > SLURM_TASKS_PER_NODE='3(x2),2,1,15,1,3,16,2,1,3(x2),2,5,4'
> >
> >
> > If in this case I run on only 32 cores
> >
> > mpirun  --report-bindings --bind-to core -np 32 ./affinity
> >
> > the process starts, but I get the original binding problem:
> >
> > [compute-6-8.local:31414] MCW rank 8 is not bound (or bound to all
> available processors)
> >
> > Running with --hetero-nodes yields exactly the same results
> >
> >
> >
> >
> >
> > Hope the above is useful. The problem with binding under SLURM with CPU
> cores spread over nodes seems to be very reproducible. It is actually very
> often that OpenMPI dies with some error like above. These tests were run
> with openmpi-1.8.8 and 1.10.0, both giving same results.
> >
> > One more suggestion. The warning message (MCW rank 8 is not bound...) is
> ONLY displayed when I use --report-bindings. It is never shown if I leave
> out this option, and although the binding is wrong the user is not
> notified. I think it would be better to show this warning in all cases
> binding fails.
> >
> > Let me know if you need more information. I can help to debug this - it
> is a rather crucial issue.
> >
> > Thanks!
> >
> > Marcin
> >
> >
> >
> >
> >
> >
> > On 10/02/2015 11:49 PM, Ralph Castain wrote:
> >> Can you please send me the allocation request you made (so I can see
> what you specified on the cmd line), and the mpirun cmd line?
> >>
> >> Thanks
> >> Ralph
> >>
> >>> On Oct 2, 2015, at 8:25 AM, Marcin Krotkiewski <
> marcin.krotkiew...@gmail.com <javascript:;>> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I fail to make OpenMPI bind to cores correctly when running from
> within SLURM-allocated CPU resources spread over a range of compute nodes
> in an otherwise homogeneous cluster. I have found this thread
> >>>
> >>> http://www.open-mpi.org/community/lists/users/2014/06/24682.php
> >>>
> >>> and did try to use what Ralph suggested there (--hetero-nodes), but it
> does not work (v. 1.10.0). When running with --report-bindings I get
> messages like
> >>>
> >>> [compute-9-11.local:27571] MCW rank 10 is not bound (or bound to all
> available processors)
> >>>
> >>> for all ranks outside of my first physical compute node. Moreover,
> everything works as expected if I ask SLURM to assign entire compute nodes.
> So it does look like Ralph's diagnose presented in that thread is correct,
> just the --hetero-nodes switch does not work for me.
> >>>
> >>> I have written a short code that uses sched_getaffinity to print the
> effective bindings: all MPI ranks except of those on the first node are
> bound to all CPU cores allocated by SLURM.
> >>>
> >>> Do I have to do something except of --hetero-nodes, or is this a
> problem that needs further investigation?
> >>>
> >>> Thanks a lot!
> >>>
> >>> Marcin
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org <javascript:;>
> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/10/27770.php
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org <javascript:;>
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/10/27774.php
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org <javascript:;>
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/10/27776.php
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org <javascript:;>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/10/27778.php

Reply via email to