Hi,

I am getting hang ups in mpi job randomly.


..............
...........
  IT:20760 CF:   0.7743 Time:  1540.0 MaxMin:20.69/5  :20.12/12
  IT:20770 CF:   0.7734 Time:  1560.2 MaxMin:20.50/1  :19.31/5
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 9399 on node node1 exited on
signal 1 (Hangup).
--------------------------------------------------------------------------
[node1:09356] filem:rsh: close()
[node1:09356] mca: base: close: component rsh closed
[node1:09356] mca: base: close: unloading component rsh
[node1:09356] mca: base: close: component default closed
[node1:09356] mca: base: close: unloading component default
[node1:09356] mca: base: close: component hnp closed
[node1:09356] mca: base: close: unloading component hnp
[node1:09356] mca: base: close: component round_robin closed
[node1:09356] mca: base: close: unloading component round_robin
[node1:09356] mca: base: close: component rsh closed
[node1:09356] mca: base: close: unloading component rsh
[node1:09356] mca: base: close: component default closed
[node1:09356] mca: base: close: unloading component default
[node1:09356] mca: base: close: component bad closed
[node1:09356] mca: base: close: unloading component bad
[node1:09356] mca: base: close: unloading component binomial
[node1:09356] mca: base: close: component tcp closed
[node1:09356] mca: base: close: unloading component tcp
[node1:09356] mca: base: close: component oob closed
[node1:09356] mca: base: close: unloading component oob
[node1:09356] mca: base: close: unloading component auto_detect
[node1:09356] mca: base: close: unloading component linux

I am using open mpi version 1.2.7 over infiniband.
I was running the application over 15 nodes.

job is started using nohup to run it in back ground.

Thanks in advance
Harichand M V

Reply via email to