Which version of OMPI are you using?
> On Jun 16, 2016, at 2:25 PM, Alex Kaiser <adkai...@gmail.com> wrote:
>
> Hello,
>
> I have an MPI code which sometimes hangs, simply stops running. It is not
> clear why and it uses many large third party libraries so I do not want to
> try to fix it. The code is easy to restart, but then it needs to be monitored
> closely by me, and I'd prefer to do it automatically.
>
> Is there a way, within an MPI process, to detect the hang and abort if so?
>
> In psuedocode, I would like to do something like
> for (all time steps)
> step
> if (nothing has happened for x minutes)
> call mpi abort to return control to the shell
> endif
> endfor
> This structure does not work however, as it can no longer do anything,
> including check itself, when it is stuck.
>
> Thank you,
> Alex
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/06/29471.php