Which version of OMPI are you using?

> On Jun 16, 2016, at 2:25 PM, Alex Kaiser <adkai...@gmail.com> wrote:
> 
> Hello, 
> 
> I have an MPI code which sometimes hangs, simply stops running. It is not 
> clear why and it uses many large third party libraries so I do not want to 
> try to fix it. The code is easy to restart, but then it needs to be monitored 
> closely by me, and I'd prefer to do it automatically.
> 
> Is there a way, within an MPI process, to detect the hang and abort if so? 
> 
> In psuedocode, I would like to do something like 
> for (all time steps)
>     step 
>     if (nothing has happened for x minutes)
>         call mpi abort to return control to the shell
>     endif 
> endfor 
> This structure does not work however, as it can no longer do anything, 
> including check itself, when it is stuck. 
> 
> Thank you, 
> Alex 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29471.php

Reply via email to