Hi Gilles,

  You're right, we no longer get warnings... and the performance disparity
still exists, though to be clear it's only in select parts of the code -
others run as we'd expect.  This is probably why I initially guessed it was
a process/memory affinity issue - the one timer I looked at is in a
memory-intensive part of the code.  Now I'm wondering if we're still
getting issues binding (I need to do a comparison with a local system), or
if it could be due to the cache size differences - the AWS C4 instances
have 25MB/socket, and we have 45MB/socket.  If we fit in cache on our
system, and don't on theirs, that could account for things.  Testing that
is next up on my list, actually.

  Cheers,
  - Brian


On Fri, Dec 22, 2017 at 7:55 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Brian,
>
> i have no doubt this was enough to get rid of the warning messages.
>
> out of curiosity, are you now able to experience performances close to
> native runs ?
> if i understand correctly, the linux kernel allocates memory on the
> closest NUMA domain (e.g. socket if i oversimplify), and since
> MPI tasks are bound by orted/mpirun before they are execv'ed, i have
> some hard time understanding how not binding MPI tasks to
> memory can have a significant impact on performances as long as they
> are bound on cores.
>
> Cheers,
>
> Gilles
>
>
> On Sat, Dec 23, 2017 at 7:27 AM, Brian Dobbins <bdobb...@gmail.com> wrote:
> >
> > Hi Ralph,
> >
> >   Well, this gets chalked up to user error - the default AMI images come
> > without the NUMA-dev libraries, so OpenMPI didn't get built with it (and
> in
> > my haste, I hadn't checked).  Oops.  Things seem to be working correctly
> > now.
> >
> >   Thanks again for your help,
> >   - Brian
> >
> >
> > On Fri, Dec 22, 2017 at 2:14 PM, r...@open-mpi.org <r...@open-mpi.org>
> wrote:
> >>
> >> I honestly don’t know - will have to defer to Brian, who is likely out
> for
> >> at least the extended weekend. I’ll point this one to him when he
> returns.
> >>
> >>
> >> On Dec 22, 2017, at 1:08 PM, Brian Dobbins <bdobb...@gmail.com> wrote:
> >>
> >>
> >>   Hi Ralph,
> >>
> >>   OK, that certainly makes sense - so the next question is, what
> prevents
> >> binding memory to be local to particular cores?  Is this possible in a
> >> virtualized environment like AWS HVM instances?
> >>
> >>   And does this apply only to dynamic allocations within an instance, or
> >> static as well?  I'm pretty unfamiliar with how the hypervisor
> (KVM-based, I
> >> believe) maps out 'real' hardware, including memory, to particular
> >> instances.  We've seen some parts of the code (bandwidth heavy) run ~10x
> >> faster on bare-metal hardware, though, presumably from memory locality,
> so
> >> it certainly has a big impact.
> >>
> >>   Thanks again, and merry Christmas!
> >>   - Brian
> >>
> >>
> >> On Fri, Dec 22, 2017 at 1:53 PM, r...@open-mpi.org <r...@open-mpi.org>
> >> wrote:
> >>>
> >>> Actually, that message is telling you that binding to core is
> available,
> >>> but that we cannot bind memory to be local to that core. You can
> verify the
> >>> binding pattern by adding --report-bindings to your cmd line.
> >>>
> >>>
> >>> On Dec 22, 2017, at 11:58 AM, Brian Dobbins <bdobb...@gmail.com>
> wrote:
> >>>
> >>>
> >>> Hi all,
> >>>
> >>>   We're testing a model on AWS using C4/C5 nodes and some of our
> timers,
> >>> in a part of the code with no communication, show really poor
> performance
> >>> compared to native runs.  We think this is because we're not binding
> to a
> >>> core properly and thus not caching, and a quick 'mpirun --bind-to core
> >>> hostname' does suggest issues with this on AWS:
> >>>
> >>> [bdobbins@head run]$ mpirun --bind-to core hostname
> >>>
> >>> ------------------------------------------------------------
> --------------
> >>> WARNING: a request was made to bind a process. While the system
> >>> supports binding the process itself, at least one node does NOT
> >>> support binding memory to the process location.
> >>>
> >>>   Node:  head
> >>>
> >>> Open MPI uses the "hwloc" library to perform process and memory
> >>> binding. This error message means that hwloc has indicated that
> >>> processor binding support is not available on this machine.
> >>>
> >>>   (It also happens on compute nodes, and with real executables.)
> >>>
> >>>   Does anyone know how to enforce binding to cores on AWS instances?
> Any
> >>> insight would be great.
> >>>
> >>>   Thanks,
> >>>   - Brian
> >>> _______________________________________________
> >>> users mailing list
> >>> users@lists.open-mpi.org
> >>> https://lists.open-mpi.org/mailman/listinfo/users
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users@lists.open-mpi.org
> >>> https://lists.open-mpi.org/mailman/listinfo/users
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/users
> >>
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/users
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to