Hi all, I have a problem with the cluster I'm currently using where nodes 'hang' silently from time to time during an MPI call. This causes the blocked MPI processes to block indefinitely -- the only way to detect an error is to notice that no more output is being written to the log files. We're trying to resolve the underlying cause of the nodes hanging, but in the mean time, is there a way to set a timeout or something similar to detect this situation? Sorry if this has been addressed before, I searched the FAQ and archives and didn't come up with anything.
Thanks, -Sam -- -------------------- J. Samuel Preston Research Assistant Scientific Computing and Imaging Institute University of Utah