Bennet,
my guess is mapping/binding to sockets was deemed the best compromise
from an
"out of the box" performance point of view.
iirc, we did fix some bugs that occured when running under asymmetric
cpusets/cgroups.
if you still have some issues with the latest Open MPI version (2.0.1)
Pardon my naivete, but why is bind-to-none not the default, and if the
user wants to specify something, they can then get into trouble
knowingly? We have had all manner of problems with binding when using
cpusets/cgroups.
-- bennet
On Thu, Sep 29, 2016 at 9:52 PM, Gilles Gouaillardet wrote:
>
David,
i guess you would have expected the default mapping/binding scheme is
core instead of sockets
iirc, we decided *not* to bind to cores by default because it is "safer"
if you simply
OMP_NUM_THREADS=8 mpirun -np 2 a.out
then, a default mapping/binding scheme by core means the OpenMP t
Rick,
can you please provide some more information :
- Open MPI version
- interconnect used
- number of tasks / number of nodes
- does the hang occur in the first MPI_Bcast of 8000 bytes ?
note there is a known issue if you MPI_Bcast with different but matching
signatures
(e.g. some tas
Folks;
I am attempting to set up a task that sends large messages via
MPI_Bcast api. I am finding that small message work ok, anything less then 8000
bytes. Anything more than this then the whole scenario hangs with most of the
worker processes pegged at 100% cpu usage. Tried som
Hello All,
Would anyone know why the default mapping scheme is socket for jobs with
more than 2 ranks? Would they be able to please take some time and
explain the reasoning? Please note I am not railing against the
decision, but rather trying to gather as much information about it as I
can so
Hi Giles et.al.,
You are right, ptl.c is in PSM2 code. As Ralph mentions, dynamic process
support was/is not working in OMPI when using PSM2 because of an issue related
to the transport keys. This was fixed in PR #1602
(https://github.com/open-mpi/ompi/pull/1602) and should be included in v2.0.
The solution was to use the "tcp", "sm" and "self" BTLs for the transport
of MPI messages, with TCP restricting only the eth0 interface to
communicate and using ob1 as p2p management layer:
mpirun --mca btl_tcp_if_include eth0 --mca pml ob1 --mca btl tcp,sm,self
-np 1 --hostfile my_hosts ./manager
Ah, that may be why it wouldn’t show up in the OMPI code base itself. If that
is the case here, then no - OMPI v2.0.1 does not support comm_spawn for PSM. It
is fixed in the upcoming 2.0.2
> On Sep 29, 2016, at 6:58 AM, Gilles Gouaillardet
> wrote:
>
> Ralph,
>
> My guess is that ptl.c comes
Ralph,
My guess is that ptl.c comes from PSM lib ...
Cheers,
Gilles
On Thursday, September 29, 2016, r...@open-mpi.org wrote:
> Spawn definitely does not work with srun. I don’t recognize the name of
> the file that segfaulted - what is “ptl.c”? Is that in your manager program?
>
>
> On Sep 2
Spawn definitely does not work with srun. I don’t recognize the name of the
file that segfaulted - what is “ptl.c”? Is that in your manager program?
> On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet
> wrote:
>
> Hi,
>
> I do not expect spawn can work with direct launch (e.g. srun)
>
> Do y
Hi,
I do not expect spawn can work with direct launch (e.g. srun)
Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the
failure
Can you please try
mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts
./manager 1
and see if it help ?
Note if you have the poss
Hello,
I am using MPI_Comm_spawn to dynamically create new processes from single
manager process. Everything works fine when all the processes are running
on the same node. But imposing restriction to run only a single process per
node does not work. Below are the errors produced during multinode
13 matches
Mail list logo