Re: [OMPI users] Network connection check

Jeff Squyres Thu, 23 Jul 2009 08:07:03 -0400

On Jul 23, 2009, at 7:36 AM, vipin kumar wrote:

I can't use blocking communication routines in my main program( "masterprocess") because any type of network failure( may be dueto physical connectivity or TCP connectivity or MPI connection asyou told) may occur. So I am using non blocking point to pointcommunication routines, and TEST later for completion of thatRequest. Once I enter a TEST loop I will test for Request complitiontill TIMEOUT. Suppose TIMEOUT has occured, In this case first I willcheck whether

Open MPI should return a failure if TCP connectivity is lost, evenwith a non-blocking point-to-point operation. The failure should bereturned in the call to MPI_TEST (and friends). So I'm not sure yourtimeout has meaning here -- if you reach the timeout, I think itsimply means that the MPI communication has not completed yet. Itdoes not necessarily mean that the MPI communication has failed.

1: Slave machine is reachable or not, (How I will do that ???Given - I have IP address and Host Name of Slave machine.)

2: if reachable, check whether program(orted and "slaveprocess")is alive or not.

MPI doesn't provide any standard way to check reachability and/orhealth of a peer process.

That being said, I think some of the academics are working on morefault tolerant / resilient MPI messaging, but I don't know if they'reready to talk about such efforts publicly yet.


--
Jeff Squyres
jsquy...@cisco.com

Re: [OMPI users] Network connection check

Reply via email to