[OMPI users] "OpenIB unable to find any HCAs": Why is this shown on a single SMP machine?

2007-09-19 Thread Tobias Burnus
If I start an MPI job with: mpirun -np 2 the following gets shown: -- [0,1,0]: OpenIB on host tux was unable to find any HCAs. Another transport will be used instead, although this may result in lower performance. --

[OMPI users] SKaMPI hangs on collectives and onesided

2007-09-19 Thread Edmund Sumbar
I'm trying to run skampi-5.0.1-r0191 under PBS over IB with the command line mpirun -np 2 ./skampi -i coll.ski -o coll_ib.sko The pt2pt and mmisc tests run to completion. The coll and onesided tests, on the other hand, start to produce output but then seem to hang. Actually, the cpus appear to

Re: [OMPI users] SKaMPI hangs on collectives and onesided

2007-09-19 Thread Gleb Natapov
On Wed, Sep 19, 2007 at 01:58:35PM -0600, Edmund Sumbar wrote: > I'm trying to run skampi-5.0.1-r0191 under PBS > over IB with the command line > >mpirun -np 2 ./skampi -i coll.ski -o coll_ib.sko Can you add choose_barrier_synchronization() to coll.ski and try again? It looks like this one: h

Re: [OMPI users] Application using OpenMPI 1.2.3 hangs, error messages in mca_btl_tcp_frag_recv

2007-09-19 Thread Daniel Rozenbaum
I'm now running the same experiment under valgrind. It's probably going to run for a few days, but interestingly what I'm seeing now is that while running under valgrind's memcheck, the app has been reporting much more of these "recv failed" errors, and not only on the server node: [host1][0,1

Re: [OMPI users] SKaMPI hangs on collectives and onesided

2007-09-19 Thread Jelena Pjesivac-Grbovic
The suggestion will probably work, but it is not a solution. "choosing barrier synchronization" is not recommended by SKaMPI team and that it reduces accuracy of the benchmark. The problem is either at pml ob1 level or in btl ib level - and it has to do with many messages being sent at the same