Hi Giles, My application had 11 processes that were running on 4 different hosts. These were started using the 'mpirun -np 11....'
After these messages, my application just got stuck and didn't moved forward and I had to interrupt it. I do not know if somebody else may have started another mpirun application on same/overlapping set of hosts (by invoking a different mpirun command). Could that interfere with my processes? I hope not. Thanks, Vipul -----Original Message----- From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles Gouaillardet via users Sent: Monday, May 4, 2020 11:36 PM To: Open MPI Users <users@lists.open-mpi.org> Cc: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> Subject: Re: [OMPI users] Warnings Hi, how many MPI tasks are you running? are you running from a terminal? from two different jobs? two mpirun within the same job? what happens next? hang? abort? crash? app runs just fine? fwiw, the message says that rank 3 received an unexpected connection from rank 4 Cheers, Gilles On Tue, May 5, 2020 at 9:08 AM Kulshrestha, Vipul via users <users@lists.open-mpi.org> wrote: > > Hi, > > > > Could somebody explain what does these warning imply? Is this caused if 2 > distinct openmpi application end up running on same machine? > > > > I am using 4.0.1 version. > > > > Thanks, > Vipul > > > > Message in the stdout of the application > > > > [orw-med-fenway1][[61362,1],3][btl_tcp_endpoint.c:626:mca_btl_tcp_endp > oint_recv_connect_ack] received unexpected process identifier > [[61362,1],4] > > > > Messages from mpirun: > > ---------------------------------------------------------------------- > ---- > > WARNING: Open MPI accepted a TCP connection from what appears to be a > > another Open MPI process but cannot find a corresponding process > > entry for that peer. > > > > This attempted connection will be ignored; your MPI job may or may not > > continue properly. > > > > Local host: orw-med-fenway2 > > PID: 10748 > > ---------------------------------------------------------------------- > ---- > > [orw-med-pats1:30498] 8 more processes have sent help message > help-mpi-btl-tcp.txt / server accept cannot find guid > > [orw-med-pats1:30498] Set MCA parameter "orte_base_help_aggregate" to > 0 to see all help / error messages > > [orw-med-pats1:30498] 4 more processes have sent help message > help-mpi-btl-tcp.txt / server accept cannot find guid > > [orw-med-pats1:30498] 1 more process has sent help message > help-mpi-btl-tcp.txt / server accept cannot find guid > > [orw-med-pats1:30498] 1 more process has sent help message > help-mpi-btl-tcp.txt / server accept cannot find guid > > [orw-med-pats1:30498] 1 more process has sent help message > help-mpi-btl-tcp.txt / server accept cannot find guid > > [orw-med-pats1:30498] 9 more processes have sent help message > help-mpi-btl-tcp.txt / server accept cannot find guid > > [orw-med-pats1:30498] 3 more processes have sent help message > help-mpi-btl-tcp.txt / server accept cannot find guid