Ole Nielsen wrote:
Thanks for your suggestion Gus, we need a way of debugging what is going on. I am pretty sure the problem lies with our cluster configuration. I know MPI simply relies on the underlying network. However, we can ping and ssh to all nodes (and in between and pair as well) so it is currently a mystery why MPI doesn't communicate across nodes on our cluster.
Two further questions for the group

   1. I would love to run the test program connectivity.c, but cannot
      find it anywhere. Can anyone help please?

If you downloaded the OpenMPI tarball, it is in examples/connectivity.c
wherever you untarred it [now where you installed].


   2. After having left the job hanging over night we got the message
      
[node5][[9454,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
      mca_btl_tcp_frag_recv: readv failed: Connection timed out (110).
      Does anyone know what this means?


Cheers and thanks
Ole
PS - I don't see how separate buffers would help. Recall that the test program I use works fine on other installations and indeed when run on one the cores of one Node.


It probably won't help, as Eugene explained.
Your program works here, worked also for Davendra Rai.
If you were using MPI_ISend [non-blocking],
then you would need separate buffers.

For large amounts of data and many processes,
I would rather use non-blocking communication [and separate
buffers], specially if you do work between send and recv.
But that's not what hangs your program.

Gus Correa





Message: 11
Date: Mon, 19 Sep 2011 10:37:02 -0400
From: Gus Correa <g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>>
Subject: Re: [OMPI users] RE :  MPI hangs on multiple nodes
To: Open MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>>
Message-ID: <4e77538e.3070...@ldeo.columbia.edu <mailto:4e77538e.3070...@ldeo.columbia.edu>>
Content-Type: text/plain; charset=iso-8859-1; format=flowed

Hi Ole

You could try the examples/connectivity.c program in the
OpenMPI source tree, to test if everything is alright.
It also hints how to solve the buffer re-use issue
that Sebastien [rightfully] pointed out [i.e., declare separate
buffers for MPI_Send and MPI_Recv].

Gus Correa


------------------------------------------------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to