Hi Giles,

My application had 11 processes that were running on 4 different hosts. These 
were started using the 'mpirun -np 11....'

After these messages, my application just got stuck and didn't moved forward 
and I had to interrupt it.

I do not know if somebody else may have started another mpirun application on 
same/overlapping set of hosts (by invoking a different mpirun command).  Could 
that interfere with my processes? I hope not.

Thanks,
Vipul


-----Original Message-----
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles 
Gouaillardet via users
Sent: Monday, May 4, 2020 11:36 PM
To: Open MPI Users <users@lists.open-mpi.org>
Cc: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
Subject: Re: [OMPI users] Warnings

Hi,

how many MPI tasks are you running?
are you running from a terminal? from two different jobs? two mpirun within the 
same job?
what happens next? hang? abort? crash? app runs just fine?

fwiw, the message says that rank 3 received an unexpected connection from rank 4

Cheers,

Gilles

On Tue, May 5, 2020 at 9:08 AM Kulshrestha, Vipul via users 
<users@lists.open-mpi.org> wrote:
>
> Hi,
>
>
>
> Could somebody explain what does these warning imply? Is this caused if 2 
> distinct openmpi application end up running on same machine?
>
>
>
> I am using 4.0.1 version.
>
>
>
> Thanks,
> Vipul
>
>
>
> Message in the stdout of the application
>
>
>
> [orw-med-fenway1][[61362,1],3][btl_tcp_endpoint.c:626:mca_btl_tcp_endp
> oint_recv_connect_ack] received unexpected process identifier 
> [[61362,1],4]
>
>
>
> Messages from mpirun:
>
> ----------------------------------------------------------------------
> ----
>
> WARNING: Open MPI accepted a TCP connection from what appears to be a
>
> another Open MPI process but cannot find a corresponding process
>
> entry for that peer.
>
>
>
> This attempted connection will be ignored; your MPI job may or may not
>
> continue properly.
>
>
>
>   Local host: orw-med-fenway2
>
>   PID:        10748
>
> ----------------------------------------------------------------------
> ----
>
> [orw-med-pats1:30498] 8 more processes have sent help message 
> help-mpi-btl-tcp.txt / server accept cannot find guid
>
> [orw-med-pats1:30498] Set MCA parameter "orte_base_help_aggregate" to 
> 0 to see all help / error messages
>
> [orw-med-pats1:30498] 4 more processes have sent help message 
> help-mpi-btl-tcp.txt / server accept cannot find guid
>
> [orw-med-pats1:30498] 1 more process has sent help message 
> help-mpi-btl-tcp.txt / server accept cannot find guid
>
> [orw-med-pats1:30498] 1 more process has sent help message 
> help-mpi-btl-tcp.txt / server accept cannot find guid
>
> [orw-med-pats1:30498] 1 more process has sent help message 
> help-mpi-btl-tcp.txt / server accept cannot find guid
>
> [orw-med-pats1:30498] 9 more processes have sent help message 
> help-mpi-btl-tcp.txt / server accept cannot find guid
>
> [orw-med-pats1:30498] 3 more processes have sent help message 
> help-mpi-btl-tcp.txt / server accept cannot find guid

Reply via email to