Hi, I am getting a "oob-tcp: Communication retries exceeded" error message when I run a 238 MPI slave code
/opt/openmpi/i386/bin/mpirun -mca btl_openib_verbose 1 --mca btl ^tcp --mca pls_ssh_agent ssh -mca oob_tcp_peer_retries 1000 --prefix /usr/lib/openmpi/1.2.8-gcc/bin -np 239 --app procgroup ------------------------------------------------------------------------ -- mpirun was unable to start the specified application as it encountered an error: Error name: Unknown error: 1 Node: ln10 when attempting to start process rank 234. ------------------------------------------------------------------------ -- [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries exceeded. Can not communicate with peer [ln13:27867] [[61748,0],0] ORTE_ERROR_LOG: Unreachable in file orted/orted_comm.c at line 130 [ln13:27867] [[61748,0],0] ORTE_ERROR_LOG: Unreachable in file orted/orted_comm.c at line 130 [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries exceeded. Can not communicate with peer [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries exceeded. Can not communicate with peer [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries exceeded. Can not communicate with peer [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries exceeded. Can not communicate with peer [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries exceeded. Can not communicate with peer [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries exceeded. Can not communicate with peer [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries exceeded. Can not communicate with peer [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries exceeded. Can not communicate with peer [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries exceeded. Can not communicate with peer [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries exceeded. Can not communicate with peer Any help would be greatly appreciated. Sincerely, Waris Sindhi High Performance Computing, TechApps Pratt & Whitney, UTC (860)-565-8486