Hi Gilles, You're right, we no longer get warnings... and the performance disparity still exists, though to be clear it's only in select parts of the code - others run as we'd expect. This is probably why I initially guessed it was a process/memory affinity issue - the one timer I looked at is in a memory-intensive part of the code. Now I'm wondering if we're still getting issues binding (I need to do a comparison with a local system), or if it could be due to the cache size differences - the AWS C4 instances have 25MB/socket, and we have 45MB/socket. If we fit in cache on our system, and don't on theirs, that could account for things. Testing that is next up on my list, actually.
Cheers, - Brian On Fri, Dec 22, 2017 at 7:55 PM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Brian, > > i have no doubt this was enough to get rid of the warning messages. > > out of curiosity, are you now able to experience performances close to > native runs ? > if i understand correctly, the linux kernel allocates memory on the > closest NUMA domain (e.g. socket if i oversimplify), and since > MPI tasks are bound by orted/mpirun before they are execv'ed, i have > some hard time understanding how not binding MPI tasks to > memory can have a significant impact on performances as long as they > are bound on cores. > > Cheers, > > Gilles > > > On Sat, Dec 23, 2017 at 7:27 AM, Brian Dobbins <bdobb...@gmail.com> wrote: > > > > Hi Ralph, > > > > Well, this gets chalked up to user error - the default AMI images come > > without the NUMA-dev libraries, so OpenMPI didn't get built with it (and > in > > my haste, I hadn't checked). Oops. Things seem to be working correctly > > now. > > > > Thanks again for your help, > > - Brian > > > > > > On Fri, Dec 22, 2017 at 2:14 PM, r...@open-mpi.org <r...@open-mpi.org> > wrote: > >> > >> I honestly don’t know - will have to defer to Brian, who is likely out > for > >> at least the extended weekend. I’ll point this one to him when he > returns. > >> > >> > >> On Dec 22, 2017, at 1:08 PM, Brian Dobbins <bdobb...@gmail.com> wrote: > >> > >> > >> Hi Ralph, > >> > >> OK, that certainly makes sense - so the next question is, what > prevents > >> binding memory to be local to particular cores? Is this possible in a > >> virtualized environment like AWS HVM instances? > >> > >> And does this apply only to dynamic allocations within an instance, or > >> static as well? I'm pretty unfamiliar with how the hypervisor > (KVM-based, I > >> believe) maps out 'real' hardware, including memory, to particular > >> instances. We've seen some parts of the code (bandwidth heavy) run ~10x > >> faster on bare-metal hardware, though, presumably from memory locality, > so > >> it certainly has a big impact. > >> > >> Thanks again, and merry Christmas! > >> - Brian > >> > >> > >> On Fri, Dec 22, 2017 at 1:53 PM, r...@open-mpi.org <r...@open-mpi.org> > >> wrote: > >>> > >>> Actually, that message is telling you that binding to core is > available, > >>> but that we cannot bind memory to be local to that core. You can > verify the > >>> binding pattern by adding --report-bindings to your cmd line. > >>> > >>> > >>> On Dec 22, 2017, at 11:58 AM, Brian Dobbins <bdobb...@gmail.com> > wrote: > >>> > >>> > >>> Hi all, > >>> > >>> We're testing a model on AWS using C4/C5 nodes and some of our > timers, > >>> in a part of the code with no communication, show really poor > performance > >>> compared to native runs. We think this is because we're not binding > to a > >>> core properly and thus not caching, and a quick 'mpirun --bind-to core > >>> hostname' does suggest issues with this on AWS: > >>> > >>> [bdobbins@head run]$ mpirun --bind-to core hostname > >>> > >>> ------------------------------------------------------------ > -------------- > >>> WARNING: a request was made to bind a process. While the system > >>> supports binding the process itself, at least one node does NOT > >>> support binding memory to the process location. > >>> > >>> Node: head > >>> > >>> Open MPI uses the "hwloc" library to perform process and memory > >>> binding. This error message means that hwloc has indicated that > >>> processor binding support is not available on this machine. > >>> > >>> (It also happens on compute nodes, and with real executables.) > >>> > >>> Does anyone know how to enforce binding to cores on AWS instances? > Any > >>> insight would be great. > >>> > >>> Thanks, > >>> - Brian > >>> _______________________________________________ > >>> users mailing list > >>> users@lists.open-mpi.org > >>> https://lists.open-mpi.org/mailman/listinfo/users > >>> > >>> > >>> > >>> _______________________________________________ > >>> users mailing list > >>> users@lists.open-mpi.org > >>> https://lists.open-mpi.org/mailman/listinfo/users > >> > >> > >> _______________________________________________ > >> users mailing list > >> users@lists.open-mpi.org > >> https://lists.open-mpi.org/mailman/listinfo/users > >> > >> > >> > >> _______________________________________________ > >> users mailing list > >> users@lists.open-mpi.org > >> https://lists.open-mpi.org/mailman/listinfo/users > > > > > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users