?
>
> Regards,
> Randolph
>
> PS: excellent product, keep up the good work
> --- On Thu, 24/6/10, Ralph Castain wrote:
>
> From: Ralph Castain
> Subject: Re: [OMPI users] more Bugs in MPI_Abort() -- mpirun
> To: "Open MPI Users"
> Received: Thursday,
-start failed processes on
backup nodes without losing the current query.
What are your thoughts?
Regards,
Randolph
PS: excellent product, keep up the good work
--- On Thu, 24/6/10, Ralph Castain wrote:
From: Ralph Castain
Subject: Re: [OMPI users] more Bugs in MPI_Abort() -- mpirun
To: "Ope
de to receive or send a signal.
>
>
> --- On Wed, 23/6/10, Jeff Squyres wrote:
>
> From: Jeff Squyres
> Subject: Re: [OMPI users] more Bugs in MPI_Abort() -- mpirun
> To: "Open MPI Users"
> Received: Wednesday, 23 June, 2010, 9:10 PM
>
> Open MP
node is powered off and can never exit as it appears to
wait indefinitely for the missing node to receive or send a signal.
--- On Wed, 23/6/10, Jeff Squyres wrote:
From: Jeff Squyres
Subject: Re: [OMPI users] more Bugs in MPI_Abort() -- mpirun
To: "Open MPI Users"
Received: Wed
PI_abort?
>
> --- On Wed, 23/6/10, David Zhang wrote:
>
> From: David Zhang
> Subject: Re: [OMPI users] more Bugs in MPI_Abort() -- mpirun
> To: "Open MPI Users"
> Received: Wednesday, 23 June, 2010, 4:37 PM
>
> Since you turned the machine off instead of
.
Are you implying I should call exit() rather than MPI_abort?
--- On Wed, 23/6/10, David Zhang wrote:
From: David Zhang
Subject: Re: [OMPI users] more Bugs in MPI_Abort() -- mpirun
To: "Open MPI Users"
Received: Wednesday, 23 June, 2010, 4:37 PM
Since you turned the machine off inste
Since you turned the machine off instead of just killing one of the
processes, no signals could be sent to other processes. Perhaps you could
institute some sort of handshaking in your software that periodically check
for the attendance of all machines, and timeout if not all are present
within so
I have a mpi program that aggregates data from multiple sql systems. It all
runs fine. To test fault tolerance I switch one of the machines off while it
is running. The result is always a hang, ie mpirun never completes.
To try and avoid this I have replaced the send and receive calls with