date:20170329

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-29 Thread Gilles Gouaillardet


Hi,


yes, please open an issue on github, and post your configure and mpirun 
command lines.


ideally, could you try the latest v1.10.6 and v2.1.0 ?


if you can reproduce the issue with a smaller number of MPI tasks, that 
would be great too



Cheers,


Gilles


On 3/28/2017 11:19 PM, Götz Waschk wrote:

Hi everyone,

so how do I proceed with this problem, do you need more information?
Should I open a bug report on github?

Regards, Götz Waschk
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Communicating MPI processes running in Docker containers in the same host by means of shared memory?

2017-03-29 Thread Jordi Guitart

Hi,

I try to provide some insights about how this could be accomplished (see
inline). Do they seem feasible?

On 26/03/2017 18:18, r...@open-mpi.org wrote:

There are a couple of things you’d need to resolve before worrying about code:

* IIRC, there is a separate ORTE daemon in each Docker container since OMPI
thinks these are separate nodes. So you’ll first need to find some way those
daemons can “discover” that they are on the same physical node. Is there
something in the container environment that could be used for this purpose?
Following the idea is this example
(https://docs.docker.com/engine/userguide/networking/work-with-networks/#basic-container-networking-example),
you could create a bridge network connecting (some of) the containers
running in the same physical host. Each container could use the 'docker
network inspect' command to obtain the list of containers connected to
that bridge network. Note that this requires exposing the Docker socket
to the container, by bind-mounting it with the -v flag.

* Once the daemons can determine they are on a shared node, then you have to be
able to create a shared memory backing file that can be accessed from within
any of the containers. In other words, one of the procs in one of the
containers is going to have to create the backing file, and then pass the
filename to the other procs on that physical node. Then those other procs need
to be able to open that file from within their container.
As shown here
(https://github.com/docker/docker/pull/8211#issuecomment-56873448), it
would be possible to start a container CONTAINER_ID that creates a
shared memory segment, and then create other containers using the
--ipc=container:CONTAINER_ID option, which can access the shared memory
segment from the first.

Are those doable in Docker? Note that Singularity doesn’t have these issues
because it only abstracts the file system, and so every container “sees” that
it is on the same node (and the ORTE daemon sits outside the container). This
is why we push people in that direction for HPC with containers.

Ralph

On Mar 25, 2017, at 8:07 AM, Jordi Guitart wrote:

Hi,

I don't have previous expertise on the source code of OpenMPI, so I don't have
a clear idea of the needed changes to implement this feature. This probably
requires some preliminary brainstorming to decide the most appropriate way to
inform OpenMPI that underlying nodes can share memory even if they have
different IP addresses.

On 24/03/2017 20:10, Jeff Squyres (jsquyres) wrote:

On Mar 24, 2017, at 6:41 AM, Jordi Guitart wrote:

Docker containers have different IP addresses, indeed, so now we know why it
does not work. I think that this could be a nice feature for OpenMPI, so I'll
probably issue a request for it ;-)

Cool.

I don't think any of the current developers in the Open MPI community are
actively working with Docker (several are working with Singularity). Would
this be a feature you'd be willing to submit a patch for?

http://bsc.es/disclaimer
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

http://bsc.es/disclaimer
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Failed to create a queue pair (QP) error

2017-03-29 Thread Ilchenko Evgeniy

Hi!

I install OpenMPI version 1.10.6,
but I get other problems.

I build OpenMPI with java-bindings (enable-mpi-java),
but Segmentation Fault error ocurred randomly,
even for programs without any communications
(just MPI.init and MPI.finalize).

My test program in application.
This program fail with segfault on random iteration (100-300 usually),
even for single mpi-process (mpirun -np 1).
I don't use any arguments for mpirun or for java (only path for java-class).

Any ideas?


bcastTest.java
Description: Binary data
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

Re: [OMPI users] Communicating MPI processes running in Docker containers in the same host by means of shared memory?

Re: [OMPI users] Failed to create a queue pair (QP) error

3 matches

Site Navigation

Mail list logo

Footer information