Hi all,

I have a problem with the cluster I'm currently using where nodes
'hang' silently from time to time during an MPI call.  This causes the
blocked MPI processes to block indefinitely -- the only way to detect
an error is to notice that no more output is being written to the log
files.  We're trying to resolve the underlying cause of the nodes
hanging, but in the mean time, is there a way to set a timeout or
something similar to detect this situation?  Sorry if this has been
addressed before, I searched the FAQ and archives and didn't come up
with anything.

Thanks,
-Sam

-- 
--------------------
J. Samuel Preston
Research Assistant
Scientific Computing and Imaging Institute
University of Utah

Reply via email to