On 29/10/14 18:11, Steven Chow wrote: > In the running process, if one node crashed, then the WHOLE job would be > killd on all allocated nodes.
What does the output look like when a node fails? It's possible that you're seeing OMPI trigger an abort due to a node failure and then Slurm decides to clean things up for you at that point. -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci