Hello All,

I am trying to use the approach explained in
https://stackoverflow.com/questions/15007164/can-mpi-publish-name-be-used-for-two-separately-started-applications/15008715#15008715
but when I start the master and slave instances on different machines I got
the following message:

--------------------------------------------------------------------------
WARNING: Open MPI accepted a TCP connection from what appears to be a
another Open MPI process but cannot find a corresponding process
entry for that peer.

This attempted connection will be ignored; your MPI job may or may not
continue properly.

  Local host: centos64
  PID:        96652
--------------------------------------------------------------------------

This happens for the first remote slave and, after this warning, the second
remote slave hangs.

All runs smoothly if I ran all instances in the same host. Can anybody give
me a hint on what to check?

I am using openmpi-v4.0.x-201905010241-888d014 and the mpirun commands are
as follows:

On host centos64:

 mpirun -H centos64 --ompi-server `cat /tmp/server.uri` \
-np 1 /home/erico/master &
sleep 2
 mpirun -H centos64  --ompi-server `cat /tmp/server.uri` \
-np 1 /home/erico/slave -i 1 &
sleep 1
 mpirun -H centos64  --ompi-server `cat /tmp/server.uri` \
-np 1 /home/erico/slave -i 2 &
sleep 1


On host centos64-cl

mpirun -H centos64-cl -oversubscribe --ompi-server `cat /tmp/server.uri`
-np 1 /home/erico/slave -i 3 &
mpirun -H centos64-cl -oversubscribe --ompi-server `cat /tmp/server.uri`
-np 1 /home/erico/slave -i 4 &

Thanks in advance!!

Erico Silva
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to