Dear Open MPI experts,
An user at our cluster has a problem running a kinda of big job:
(- the job using 3024 processes (12 per node, 252 nodes) runs fine)
- the job using 4032 processes (12 per node, 336 nodes) produce the error
attached below.
Well, the http://www.open-mpi.org/faq/?category=
Hallo,
Thank you for this. When I start the mpi test with the option "--mca btl
openib,sm,self" I can start it on on node. But I can't start it on two
nodes. The Error then is:
schmitt$ /amd/software/openmpi-1.6.5/cltest/bin/mpirun -n 2 -H
cluster1,cluster2 /worklocal/schmitt/imb/3.2.4/src/IMB-M
It sounds like you have some kind of memory error in your application; you
should run your code through a memory-checking debugger, such as valgrind.
On Jul 24, 2013, at 2:44 AM, Zhubq wrote:
>
>>
>> Hi all,
>>
>> I got a problem when call MPI subroutines in a loop. For example, I have
>>
Sorry for the delay in replying.
That's a really odd one; I haven't seen that before.
I'm afraid that the only real solution I can think of here is to attach a
debugger to your mpirun process (I'm *guessing* it's mpirun that is issuing
that error?) and see why ompi_evesel->dispatch() is failing
Am 30.07.2013 um 15:01 schrieb christian schmitt:
> I´m trying to get openmpi(1.6.5) running with/over infiniband.
> My system is a centOS 6.3. I have installed the Mellanox OFED driver
> (2.0) and everything seems working. ibhosts shows all hosts and the switch.
> A "hca_self_test.ofed" shows:
>
Hi Christian
If I understand you right, you want to use Open MPI with
Infiniband, not Ethernet, right?
If that is the case, try
'-mca btl openib,sm,self'
in your mpiexec command line.
I don't think ipoib is required for Open MPI.
See these FAQ (FAQ is the best OpenMPI documentation):
http://ww
Hallo,
I´m trying to get openmpi(1.6.5) running with/over infiniband.
My system is a centOS 6.3. I have installed the Mellanox OFED driver
(2.0) and everything seems working. ibhosts shows all hosts and the switch.
A "hca_self_test.ofed" shows:
Performing Adapter Device Self Test
Number