On 29/10/14 18:11, Steven Chow wrote:

> In the running process, if one node crashed, then the WHOLE job would be
> killd on all allocated nodes.

What does the output look like when a node fails?

It's possible that you're seeing OMPI trigger an abort due to a node
failure and then Slurm decides to clean things up for you at that point.

-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

Reply via email to