Todd,
Similar issues were also reported when there is Network Translation
(NAT) between hosts, and that occured when using kvm/qemu virtual
machine running on the same host.
First you need to list the available interfaces on both nodes. Then try
to restrict to a single interface that is known to be working
(no firewall and no NAT)
(e.g. mpirun --mca btl_tcp_if_include eth0 --mca oob_tcp_if_include eth0
...)
If that does not help make sure there is no NAT:
on the first node, run
nc -v -l 1234
then on the other node, run
nc <ip of the first node> 1234
If you go back to the first node, you should see the expected ip of the
second node.
If not, there is NAT somewhere and that does not fly well with Open MPI
Cheers,
Gilles
On 3/28/2023 8:53 AM, Todd Spencer via users wrote:
OpenMPI Users,
I hope this email finds you all well. I am writing to bring to your
attention an issue that I have encountered while using OpenMPI.
I received the following error message while running a job:
"Open MPI detected an inbound MPI TCP connection request from a peer
that appears to be part of this MPI job (i.e., it identified itself as
part of this Open MPI job), but it is from an IP address that is
unexpected. This is highly unusual. The inbound connection has been
dropped, and the peer should simply try again with a different IP
interface (i.e., the job should hopefully be able to continue).
Local host: node02 Local PID: 17805 Peer hostname: node01
([[23078,1],2]) Source IP of socket: 192.168.0.3 Known IPs of peer:
192.168.0.225"
I have tried to troubleshoot the issue but to no avail. As a new user
to this subject, I am not sure what could be causing this issue. I did
try forcing the nodes to talk to each other using eth0 using the "-mca
btl_tcp_if_include eth0" command but it did not work.
I found a GitHub thread <https://github.com/open-mpi/ompi/issues/5818>
from 2018 that discussed the issue, but since I am new to this, a lot
of the subject matter went over my head. Could you please advise on
what could be causing this issue and how to resolve it? If you need
any additional information, I would be happy to provide it.
Thank you in advance for your help.
Best regards,
Todd