Hi, I am getting hang ups in mpi job randomly.
.............. ........... IT:20760 CF: 0.7743 Time: 1540.0 MaxMin:20.69/5 :20.12/12 IT:20770 CF: 0.7734 Time: 1560.2 MaxMin:20.50/1 :19.31/5 -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 9399 on node node1 exited on signal 1 (Hangup). -------------------------------------------------------------------------- [node1:09356] filem:rsh: close() [node1:09356] mca: base: close: component rsh closed [node1:09356] mca: base: close: unloading component rsh [node1:09356] mca: base: close: component default closed [node1:09356] mca: base: close: unloading component default [node1:09356] mca: base: close: component hnp closed [node1:09356] mca: base: close: unloading component hnp [node1:09356] mca: base: close: component round_robin closed [node1:09356] mca: base: close: unloading component round_robin [node1:09356] mca: base: close: component rsh closed [node1:09356] mca: base: close: unloading component rsh [node1:09356] mca: base: close: component default closed [node1:09356] mca: base: close: unloading component default [node1:09356] mca: base: close: component bad closed [node1:09356] mca: base: close: unloading component bad [node1:09356] mca: base: close: unloading component binomial [node1:09356] mca: base: close: component tcp closed [node1:09356] mca: base: close: unloading component tcp [node1:09356] mca: base: close: component oob closed [node1:09356] mca: base: close: unloading component oob [node1:09356] mca: base: close: unloading component auto_detect [node1:09356] mca: base: close: unloading component linux I am using open mpi version 1.2.7 over infiniband. I was running the application over 15 nodes. job is started using nohup to run it in back ground. Thanks in advance Harichand M V