Re: [OMPI users] MPI daemon died unexpectedly

2012-03-27 Thread John Hearns
Grzegorz, sometimes when a parallel application quits there are processes left running on the compute nodes. You can usually find these by running 'pgrep -P 1' and excluding any processes owned by root. These 'orphan' processes use up memory - so if you are having problems with applications quittin

Re: [OMPI users] MPI daemon died unexpectedly

2012-03-27 Thread Grzegorz Maj
John, thank you for your reply. I checked the system logs and there are no signs of oom killer. What do you mean by cleaning 'orphan' processes? Should I check if there are any processes left after each job execution? I have always been assuming that when mpirun terminates, everything is cleaned

Re: [OMPI users] MPI daemon died unexpectedly

2012-03-27 Thread John Hearns
Have you checked the system logs on the machines where this is running? Is it perhaps that the processes use lots of memory and the Out Of Memory (OOM) killer is killing them? Also check all nodes for left-over 'orphan' processes which are still running after a job finishes - these should be killed