i have 3 nodes, one is master node and another is computing nodes,these nodes 
deployed in the internet (not in cluster)


when i running NPB (NASA parallel benchmark) in one node (use 2 processes)
 mpirun -np 2  exe. 
I can get the successful result, but when i running in two nodes(for example 
running on B and C nodes) i got a fail
mprirun -nolocal -hostfile hostfile -np 2 exe.
the fail information is :
B [0,1,0] connectimeout ,connect() fail errno=110 
C [0,1,1] connectimeout ,connect() fail errno=110
but the connect between B and  C has no problem, because i can use ping and ssh 
form B to C (or C to B).
I think this problem may be caused by the para connectimeout (so little that 
lead  fail?). Because my nodes deployed on internet so delay is bigger. 
who can help me attack this problem and how to set the connectimeout in openmpi?



Reply via email to