Re: [OMPI users] Fault Tolerant Method

2006-07-28 Thread Ralph Castain
Actually, we had a problem in our implementation that caused the system to continually reuse the same machine allocations for each "spawn" request. In other words, we always started with the top of the machine_list whenever your program called comm_spawn. This appears to have been the source of th

Re: [OMPI users] Fault Tolerant Method

2006-07-28 Thread Edgar Gabriel
don't forget furthermore, that for successfully using this fault-tolerance approach, the parents or other child processes should not be affected by the death/failure of another child process. Right now in Open MPI, if one of the child processes (which you spawned using MPI_Comm_spawn) fails, th

Re: [OMPI users] Fault Tolerant Method

2006-07-28 Thread Josh Hursey
> I have implemented the fault tolerance method in which you would use > MPI_COMM_SPAWN to dynamically create communication groups and use > those communicators for a form of process fault tolerance (as > described by William Gropp and Ewing Lusk in their 2004 paper), > but am having some problems

[OMPI users] Fault Tolerant Method

2006-07-28 Thread bdickinson
I have implemented the fault tolerance method in which you would use MPI_COMM_SPAWN to dynamically create communication groups and use those communicators for a form of process fault tolerance (as described by William Gropp and Ewing Lusk in their 2004 paper), but am having some problems getting i