Hi,
     I am getting a "oob-tcp: Communication retries exceeded" error
message when I run a 238 MPI slave code


/opt/openmpi/i386/bin/mpirun -mca btl_openib_verbose 1 --mca btl ^tcp
--mca pls_ssh_agent ssh -mca oob_tcp_peer_retries 1000 --prefix
/usr/lib/openmpi/1.2.8-gcc/bin -np 239 --app procgroup
------------------------------------------------------------------------
--
mpirun was unable to start the specified application as it encountered
an error:

Error name: Unknown error: 1
Node: ln10

when attempting to start process rank 234.
------------------------------------------------------------------------
--
[ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
exceeded.  Can not communicate with peer
[ln13:27867] [[61748,0],0] ORTE_ERROR_LOG: Unreachable in file
orted/orted_comm.c at line 130
[ln13:27867] [[61748,0],0] ORTE_ERROR_LOG: Unreachable in file
orted/orted_comm.c at line 130
[ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
exceeded.  Can not communicate with peer
[ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
exceeded.  Can not communicate with peer
[ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
exceeded.  Can not communicate with peer
[ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
exceeded.  Can not communicate with peer
[ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
exceeded.  Can not communicate with peer
[ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
exceeded.  Can not communicate with peer
[ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
exceeded.  Can not communicate with peer
[ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
exceeded.  Can not communicate with peer
[ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
exceeded.  Can not communicate with peer
[ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
exceeded.  Can not communicate with peer

Any help would be greatly appreciated.

Sincerely,

Waris Sindhi
High Performance Computing, TechApps
Pratt & Whitney, UTC
(860)-565-8486


Reply via email to