[OMPI users] TCP usage in MPI singletons

2019-04-17 Thread Daniel Hemberger
Hi everyone,

I've been trying to track down the source of TCP connections when running
MPI singletons, with the goal of avoiding all TCP communication to free up
ports for other processes. I have a local apt install of OpenMPI 2.1.1 on
Ubuntu 18.04 which does not establish any TCP connections by default,
either when run as "mpirun -np 1 ./program" or "./program". But it has
non-TCP alternatives for both the BTL (vader, self, etc.) and OOB (ud and
usock) frameworks, so I was not surprised by this result.

On a remote machine, I'm running the same test with an assortment of
OpenMPI versions (1.6.4, 1.8.6, 4.0.0, 4.0.1 on RHEL6 and 1.10.7 on RHEL7).
In all but 1.8.6 and 1.10.7, there is always a TCP connection established,
even if I disable the TCP BTL on the command line (e.g. "mpirun --mca btl
^tcp"). Therefore, I assumed this was because `tcp` was the only OOB
interface available in these installations. This TCP connection is
established both for "mpirun -np 1 ./program" and "./program".

The confusing part is that the 1.8.6 and 1.10.7 installations only appear
to establish a TCP connection when invoked with "mpirun -np 1 ./program",
but _not_ with "./program", even though its only OOB interface was also
`tcp`. This result was not consistent with my understanding, so now I am
confused about when I should expect TCP communication to occur.

Is there a known explanation for what I am seeing? Is there actually a way
to get singletons to forego all TCP communication, even if TCP is the only
OOB available, or is there something else at play here? I'd be happy to
provide any config.log files or ompi_info output if it would help.

For more context, the underlying issue I'm trying to resolve is that we are
(unfortunately) running many short instances of mpirun, and the TCP
connections are piling up in the TIME_WAIT state because they aren't
cleaned up faster than we create them.

Any advice or pointers would be greatly appreciated!

Thanks,
-Dan
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] TCP usage in MPI singletons

2019-04-19 Thread Daniel Hemberger
Hi Gilles, all,

Using `OMPI_MCA_ess_singleton_isolated=true ./program` achieves the desired
result of establishing no TCP connections for a singleton execution.

Thank you for the suggestion!

Best regards,
-Dan

On Wed, Apr 17, 2019 at 5:35 PM Gilles Gouaillardet 
wrote:

> Daniel,
>
>
> If your MPI singleton will never MPI_Comm_spawn(), then you can use the
> isolated mode like this
>
> OMPI_MCA_ess_singleton_isolated=true ./program
>
>
> You can also save some ports by blacklisting the btl/tcp component
>
>
> OMPI_MCA_ess_singleton_isolated=true OMPI_MCA_pml=ob1
> OMPI_MCA_btl=vader,self ./program
>
>
> Cheers,
>
>
> Gilles
>
> On 4/18/2019 3:51 AM, Daniel Hemberger wrote:
> > Hi everyone,
> >
> > I've been trying to track down the source of TCP connections when
> > running MPI singletons, with the goal of avoiding all TCP
> > communication to free up ports for other processes. I have a local apt
> > install of OpenMPI 2.1.1 on Ubuntu 18.04 which does not establish any
> > TCP connections by default, either when run as "mpirun -np 1
> > ./program" or "./program". But it has non-TCP alternatives for both
> > the BTL (vader, self, etc.) and OOB (ud and usock) frameworks, so I
> > was not surprised by this result.
> >
> > On a remote machine, I'm running the same test with an assortment of
> > OpenMPI versions (1.6.4, 1.8.6, 4.0.0, 4.0.1 on RHEL6 and 1.10.7 on
> > RHEL7). In all but 1.8.6 and 1.10.7, there is always a TCP connection
> > established, even if I disable the TCP BTL on the command line (e.g.
> > "mpirun --mca btl ^tcp"). Therefore, I assumed this was because `tcp`
> > was the only OOB interface available in these installations. This TCP
> > connection is established both for "mpirun -np 1 ./program" and
> > "./program".
> >
> > The confusing part is that the 1.8.6 and 1.10.7 installations only
> > appear to establish a TCP connection when invoked with "mpirun -np 1
> > ./program", but _not_ with "./program", even though its only OOB
> > interface was also `tcp`. This result was not consistent with my
> > understanding, so now I am confused about when I should expect TCP
> > communication to occur.
> >
> > Is there a known explanation for what I am seeing? Is there actually a
> > way to get singletons to forego all TCP communication, even if TCP is
> > the only OOB available, or is there something else at play here? I'd
> > be happy to provide any config.log files or ompi_info output if it
> > would help.
> >
> > For more context, the underlying issue I'm trying to resolve is that
> > we are (unfortunately) running many short instances of mpirun, and the
> > TCP connections are piling up in the TIME_WAIT state because they
> > aren't cleaned up faster than we create them.
> >
> > Any advice or pointers would be greatly appreciated!
> >
> > Thanks,
> > -Dan
> >
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users