If it would have been a problem with nohup and openmpi would the problem be
always reproducible.
But this problem is occuring intermittently only.
Some jobs even get completed without any problem.
While your method starts mpirun itself nohup, the mpi processes themselves are
not launched that way and therefore run in the foreground. This message
indicates that at least one of those mpi processes received a hangup signal and
aborted. Even though mpirun won't get the signal itself, it does
Hi,
I am getting hang ups in mpi job randomly.
..
...
IT:20760 CF: 0.7743 Time: 1540.0 MaxMin:20.69/5 :20.12/12
IT:20770 CF: 0.7734 Time: 1560.2 MaxMin:20.50/1 :19.31/5
--
mpirun noticed that