Re: [OMPI users] MPI hangs on multiple nodes

2011-09-25 Thread Ole Nielsen
Topics: > > 1. Re: RE : MPI hangs on multiple nodes (Gus Correa) > 2. Typo in MPI_Cart_coords man page (Jeremiah Willcock) > 3. Re: RE : MPI hangs on multiple nodes (Gus Correa) > 4. How could OpenMPI (or MVAPICH) affect floating-point results? > (Blosch, Edwin L) &g

[OMPI users] MPI hangs on multiple nodes

2011-09-19 Thread Ole Nielsen
r MPI. Cheers Ole Nielsen -- Here's the output which shows the freeze in the third iteration: nielso@alamba:~/sandpit/pypar/source$ mpirun --hostfile /etc/mpihosts --host node5,node6 --npernode 2 a.out Number of processes = 4 Test repeated 3 times for reliability I am process 2 on nod

[OMPI users] MPI hangs on multiple nodes

2011-09-19 Thread Ole Nielsen
Thanks for your suggestion Gus, we need a way of debugging what is going on. I am pretty sure the problem lies with our cluster configuration. I know MPI simply relies on the underlying network. However, we can ping and ssh to all nodes (and in between and pair as well) so it is currently a mystery

[OMPI users] MPI hangs on multiple nodes

2011-09-19 Thread Ole Nielsen
The test program is available here: http://code.google.com/p/pypar/source/browse/source/mpi_test.c Hopefully, someone can help us troubleshoot why communications stop when multiple nodes are involved and CPU usage goes to 100% for as long as we leave the program running. Many thanks Ole Nielsen

Re: [OMPI users] MPI hangs on multiple nodes

2011-09-19 Thread Ole Nielsen
anyone else seen this behavior or can anyone give me a hint on how to troubleshoot. Cheers and thanks Ole Nielsen Output: nielso@alamba:~/sandpit/pypar/source$ mpirun --hostfile /etc/mpihosts --host node17,node18 --npernode 2 a.out Number of processes = 4 Test repeated 3 times for reliability I am

[OMPI users] MPI hangs on multiple nodes

2011-09-19 Thread Ole Nielsen
Cheers and thanks Ole Nielsen Test output across two nodes (This one hangs) -- nielso@alamba:~/sandpit/pypar/source$ mpirun --hostfile /etc/mpihosts --host node17,node18 --npernode 2 a.out Number of processes = 4 Test repeated 3 times for reliability I am