[OMPI users] Big job, InfiniBand, MPI_Alltoallv and ibv_create_qp failed

2013-07-30 Thread Paul Kapinos
Dear Open MPI experts, An user at our cluster has a problem running a kinda of big job: (- the job using 3024 processes (12 per node, 252 nodes) runs fine) - the job using 4032 processes (12 per node, 336 nodes) produce the error attached below. Well, the http://www.open-mpi.org/faq/?category=

Re: [OMPI users] openmpi+infiniband

2013-07-30 Thread christian schmitt
Hallo, Thank you for this. When I start the mpi test with the option "--mca btl openib,sm,self" I can start it on on node. But I can't start it on two nodes. The Error then is: schmitt$ /amd/software/openmpi-1.6.5/cltest/bin/mpirun -n 2 -H cluster1,cluster2 /worklocal/schmitt/imb/3.2.4/src/IMB-M

Re: [OMPI users] MPI error in a loop

2013-07-30 Thread Jeff Squyres (jsquyres)
It sounds like you have some kind of memory error in your application; you should run your code through a memory-checking debugger, such as valgrind. On Jul 24, 2013, at 2:44 AM, Zhubq wrote: > >> >> Hi all, >> >> I got a problem when call MPI subroutines in a loop. For example, I have >>

Re: [OMPI users] ompi_evesel->dispatch() failed when running from Java Process Builder

2013-07-30 Thread Jeff Squyres (jsquyres)
Sorry for the delay in replying. That's a really odd one; I haven't seen that before. I'm afraid that the only real solution I can think of here is to attach a debugger to your mpirun process (I'm *guessing* it's mpirun that is issuing that error?) and see why ompi_evesel->dispatch() is failing

Re: [OMPI users] openmpi+infiniband

2013-07-30 Thread Reuti
Am 30.07.2013 um 15:01 schrieb christian schmitt: > I´m trying to get openmpi(1.6.5) running with/over infiniband. > My system is a centOS 6.3. I have installed the Mellanox OFED driver > (2.0) and everything seems working. ibhosts shows all hosts and the switch. > A "hca_self_test.ofed" shows: >

Re: [OMPI users] openmpi+infiniband

2013-07-30 Thread Gus Correa
Hi Christian If I understand you right, you want to use Open MPI with Infiniband, not Ethernet, right? If that is the case, try '-mca btl openib,sm,self' in your mpiexec command line. I don't think ipoib is required for Open MPI. See these FAQ (FAQ is the best OpenMPI documentation): http://ww

[OMPI users] openmpi+infiniband

2013-07-30 Thread christian schmitt
Hallo, I´m trying to get openmpi(1.6.5) running with/over infiniband. My system is a centOS 6.3. I have installed the Mellanox OFED driver (2.0) and everything seems working. ibhosts shows all hosts and the switch. A "hca_self_test.ofed" shows: Performing Adapter Device Self Test Number