On Thu, Jul 23, 2020 at 9:35 PM Satish Balay <[email protected]> wrote:

> On Thu, 23 Jul 2020, Jeff Hammond wrote:
>
> > Open-MPI refuses to let users over subscribe without an extra flag to
> > mpirun.
>
> Yes - and when using this flag - it lets the run through - but there is
> still performance degradation in oversubscribe mode.
>
> > I think Intel MPI has an option for blocking poll that supports
> > oversubscription “nicely”.
>
> What option is this? Is it compile time option or something for mpiexec?
>

https://software.intel.com/content/www/us/en/develop/articles/tuning-the-intel-mpi-library-advanced-techniques.html

Apply wait mode to oversubscribed jobs

This option is particularly relevant for oversubscribed MPI jobs. The goal
is to enable the wait mode of the progress engine in order to wait for
messages without polling the fabric(s). This can save CPU cycles but
decreases the message-response rate (latency), so it should be used with
caution. To enable wait mode simply use:

I_MPI_WAIT_MODE=1


Jeff


> Satish
>
> > MPICH might have a “no local” option that
> > disables shared memory, in which case nemesis over libfabric with the
> > sockets or TCP provider _might_ do the right thing. But you should ask
> > MPICH people for details.
> >
> > Jeff
> >
> > On Thu, Jul 23, 2020 at 12:40 PM Jed Brown <[email protected]> wrote:
> >
> > > I think we should default to ch3:nemesis when --download-mpich, and
> only
> > > do ch3:sock when requested (which we would do in CI).
> > >
> > > Satish Balay via petsc-dev <[email protected]> writes:
> > >
> > > > Primarily because ch3:sock performance does not degrade in
> oversubscribe
> > > mode - which is developer friendly - i.e on your laptop.
> > > >
> > > > And folks doing optimized runs should use a properly tuned MPI for
> their
> > > setup anyway.
> > > >
> > > > In this case --download-mpich-device=ch3:nemesis is likely
> appropriate
> > > if using --download-mpich [and not using a separate/optimized MPI]
> > > >
> > > > Having defaults that satisfy all use cases is not practical.
> > > >
> > > > Satish
> > > >
> > > > On Wed, 22 Jul 2020, Matthew Knepley wrote:
> > > >
> > > >> We default to ch3:sock. Scott MacLachlan just had a long thread on
> the
> > > >> Firedrake list where it ended up that reconfiguring using
> ch3:nemesis
> > > had a
> > > >> 2x performance boost on his 16-core proc, and noticeable effect on
> the 4
> > > >> core speedup.
> > > >>
> > > >> Why do we default to sock?
> > > >>
> > > >>   Thanks,
> > > >>
> > > >>      Matt
> > > >>
> > > >>
> > >
> >
>
-- 
Jeff Hammond
[email protected]
http://jeffhammond.github.io/

Reply via email to