; /usr/mpi/intel/openmpi-1.4.3/lib64/libopen-rte.so.0 (0x2b6e7cb9c000)
>libopen-pal.so.0 =>
> /usr/mpi/intel/openmpi-1.4.3/lib64/libopen-pal.so.0 (0x2b6e7ce01000)
> libdl.so.2 => /lib64/libdl.so.2 (0x2b6e7d077000)
>libnsl.so.1 => /lib64/libnsl.so.1 (0x2b
Ole Nielsen wrote:
Thanks for your suggestion Gus, we need a way of debugging what is going
on. I am pretty sure the problem lies with our cluster configuration. I
know MPI simply relies on the underlying network. However, we can ping
and ssh to all nodes (and in between and pair as well) so it
>> 1: After a reboot of two nodes I ran again, and the inter-node freeze didn't
>happen until the third iteration. I take that to mean that the basic
>communication works, but that something is saturating. Is there some notion
>of buffer size somewhere in the MPI system that could explain this?
>
On Sep 19, 2011, at 10:23 PM, Ole Nielsen wrote:
> Hi all - and sorry for the multiple postings, but I have more information.
+1 on Eugene's comments. The test program looks fine to me.
FWIW, you don't need -lmpi to compile your program; OMPI's wrapper compiler
allows you to just:
mpicc m
Hi all - and sorry for the multiple postings, but I have more information.
1: After a reboot of two nodes I ran again, and the inter-node freeze didn't
happen until the third iteration. I take that to mean that the basic
communication works, but that something is saturating. Is there some notion
o
Thanks for your suggestion Gus, we need a way of debugging what is going on.
I am pretty sure the problem lies with our cluster configuration. I know MPI
simply relies on the underlying network. However, we can ping and ssh to all
nodes (and in between and pair as well) so it is currently a mystery
./mpi_test
So, maybe this helps you.
Best,
Devendra Rai
From: Ole Nielsen
To: us...@open-mpi.org
Sent: Monday, 19 September 2011, 10:59
Subject: [OMPI users] MPI hangs on multiple nodes
The test program is available here:
http://code.google.com/p/pypar/source
The test program is available here:
http://code.google.com/p/pypar/source/browse/source/mpi_test.c
Hopefully, someone can help us troubleshoot why communications stop when
multiple nodes are involved and CPU usage goes to 100% for as long as we
leave the program running.
Many thanks
Ole Nielsen
Further to the posting below, I can report that the test program (attached -
this time correctly) is chewing up CPU time on both compute nodes for as
long as I care to let it continue.
It would appear that MPI_Receive which is the next command after the print
statements in the test program.
Has an
Hi all
We have been using OpenMPI for many years with Ubuntu on our 20-node
cluster. Each node has 2 quad cores, so we usually run up to 8 processes on
each node up to a maximum of 160 processes.
However, we just upgraded the cluster to Ubuntu 11.04 with Open MPI 1.4.3
and and have come across a
10 matches
Mail list logo