Re: [OMPI users] IBV_EVENT_QP_ACCESS_ERR

2013-01-23 Thread Shamis, Pavel
> have a user whos code at scale dies reliably with the errors (new hosts each > time): > > We have been using for this code: > -mca btl_openib_receive_queues X,4096,128:X,12288,128:X,65536,12 > > Without that option it dies with an out of memory message reliably. > > Note this code runs fine

[OMPI users] IBV_EVENT_QP_ACCESS_ERR

2013-01-23 Thread Brock Palen
have a user whos code at scale dies reliably with the errors (new hosts each time): We have been using for this code: -mca btl_openib_receive_queues X,4096,128:X,12288,128:X,65536,12 Without that option it dies with an out of memory message reliably. Note this code runs fine at the same scale

Re: [OMPI users] MXM vs OpenIB

2013-01-23 Thread Shamis, Pavel
>>> You sound like our vendors, "what is your app" >> >> ;-) I used to be one. >> >> Ideally OMPI should do the switch between MXM/RC/XRC internally in the >> transport layer. Unfortunately, >> we don't have such smart selection logic. Hopefully IB vendors will fix some >> day. > > I actua

Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-23 Thread Ralph Castain
I suspect the problem is that the rsh/ssh launcher is attempting to use a tree pattern for launching the apps - i.e., mpirun launches a daemon on the first couple of nodes, and then those daemons launch daemons on the next level. If rsh/ssh isn't supported on those backend nodes, then this won't

Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-23 Thread Ada Mancuso
Yes I can but with at most two machines as slave and one machine as master, If I try to add another one as slave I get those errors. Il giorno 23/gen/2013 14:38, "Jeff Squyres (jsquyres)" ha scritto: > I'm not sure I understand you. Does Open MPI work across multiple > machines? I.e., can you d

Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-23 Thread Jeff Squyres (jsquyres)
I'm not sure I understand you. Does Open MPI work across multiple machines? I.e., can you do all three of those steps across multiple machines? On Jan 23, 2013, at 8:16 AM, Ada Mancuso wrote: > I'm sure that openmpi works, morever my problem happens only with more than 2 > slaves (on differ

Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-23 Thread Ada Mancuso
I'm sure that openmpi works, morever my problem happens only with more than 2 slaves (on different machines while in local it greatly works with any number of slaves). Thanks Ada Il giorno 23/gen/2013 14:04, "Jeff Squyres (jsquyres)" ha scritto: > Are you able to run the C examples in the example

Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-23 Thread Jeff Squyres (jsquyres)
Are you able to run the C examples in the examples/ directory from the tarball? Our README suggests the following: - When verifying a new Open MPI installation, we recommend running three tests: 1. Use "mpirun" to launch a non-MPI program (e.g., hostname or uptime) across multiple nodes.

Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-23 Thread Ada Mancuso
Hi, I've installed the latest snapshot taken from svn developer's trunk but I had the same problems. This is my configuration: - Ubuntu 2.6.38-8 kernel - Openssh_5.8p1 openssl 0.9.8o - Libtool version 2.4 - Open mpi 1.7 rc5 and latest snapshots. Do you think my problem could be relate

Re: [OMPI users] OMPI 1.6.3, InfiniBand and MTL MXM; unable to make it work!

2013-01-23 Thread Alina Sklarevich
Some more info: The MOFED that you will download will have MXM in it, but it is an older version of it (v1.1). A new version of MXM (v1.5) is available. So, after installing MOFED, please erase the MXM in it (rpm -e mxm) and download the new MXM (v1.5) from: http://www.mellanox.com/page/products_

Re: [OMPI users] OMPI 1.6.3, InfiniBand and MTL MXM; unable to make it work!

2013-01-23 Thread Alina Sklarevich
Hello Francesco, Please download and install MOFED from: http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers (the one that matches to your OS) Then MXM will be compatible to your OS. Thanks, Alina. On Mon, Jan 21, 2013 at 5:00 PM, Francesco Simula < francesco.sim.