[OMPI users] Q: Binding to cores on AWS?

2017-12-22 Thread Brian Dobbins
Hi all,

  We're testing a model on AWS using C4/C5 nodes and some of our timers, in
a part of the code with no communication, show really poor performance
compared to native runs.  We think this is because we're not binding to a
core properly and thus not caching, and a quick 'mpirun --bind-to core
hostname' does suggest issues with this on AWS:

*[bdobbins@head run]$ mpirun --bind-to core hostname*
*--*
*WARNING: a request was made to bind a process. While the system*
*supports binding the process itself, at least one node does NOT*
*support binding memory to the process location.*

*  Node:  head*

*Open MPI uses the "hwloc" library to perform process and memory*
*binding. This error message means that hwloc has indicated that*
*processor binding support is not available on this machine.*

  (It also happens on compute nodes, and with real executables.)

  Does anyone know how to enforce binding to cores on AWS instances?  Any
insight would be great.

  Thanks,
  - Brian
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Q: Binding to cores on AWS?

2017-12-22 Thread r...@open-mpi.org
Actually, that message is telling you that binding to core is available, but 
that we cannot bind memory to be local to that core. You can verify the binding 
pattern by adding --report-bindings to your cmd line.


> On Dec 22, 2017, at 11:58 AM, Brian Dobbins  wrote:
> 
> 
> Hi all,
> 
>   We're testing a model on AWS using C4/C5 nodes and some of our timers, in a 
> part of the code with no communication, show really poor performance compared 
> to native runs.  We think this is because we're not binding to a core 
> properly and thus not caching, and a quick 'mpirun --bind-to core hostname' 
> does suggest issues with this on AWS:
> 
> [bdobbins@head run]$ mpirun --bind-to core hostname
> --
> WARNING: a request was made to bind a process. While the system
> supports binding the process itself, at least one node does NOT
> support binding memory to the process location.
> 
>   Node:  head
> 
> Open MPI uses the "hwloc" library to perform process and memory
> binding. This error message means that hwloc has indicated that
> processor binding support is not available on this machine.
> 
>   (It also happens on compute nodes, and with real executables.)
> 
>   Does anyone know how to enforce binding to cores on AWS instances?  Any 
> insight would be great.  
> 
>   Thanks,
>   - Brian
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Q: Binding to cores on AWS?

2017-12-22 Thread Brian Dobbins
  Hi Ralph,

  OK, that certainly makes sense - so the next question is, what prevents
binding memory to be local to particular cores?  Is this possible in a
virtualized environment like AWS HVM instances?

  And does this apply only to dynamic allocations within an instance, or
static as well?  I'm pretty unfamiliar with how the hypervisor (KVM-based,
I believe) maps out 'real' hardware, including memory, to particular
instances.  We've seen *some* parts of the code (bandwidth heavy) run ~10x
faster on bare-metal hardware, though, *presumably* from memory locality,
so it certainly has a big impact.

  Thanks again, and merry Christmas!
  - Brian


On Fri, Dec 22, 2017 at 1:53 PM, r...@open-mpi.org  wrote:

> Actually, that message is telling you that binding to core is available,
> but that we cannot bind memory to be local to that core. You can verify the
> binding pattern by adding --report-bindings to your cmd line.
>
>
> On Dec 22, 2017, at 11:58 AM, Brian Dobbins  wrote:
>
>
> Hi all,
>
>   We're testing a model on AWS using C4/C5 nodes and some of our timers,
> in a part of the code with no communication, show really poor performance
> compared to native runs.  We think this is because we're not binding to a
> core properly and thus not caching, and a quick 'mpirun --bind-to core
> hostname' does suggest issues with this on AWS:
>
> *[bdobbins@head run]$ mpirun --bind-to core hostname*
>
> *--*
> *WARNING: a request was made to bind a process. While the system*
> *supports binding the process itself, at least one node does NOT*
> *support binding memory to the process location.*
>
> *  Node:  head*
>
> *Open MPI uses the "hwloc" library to perform process and memory*
> *binding. This error message means that hwloc has indicated that*
> *processor binding support is not available on this machine.*
>
>   (It also happens on compute nodes, and with real executables.)
>
>   Does anyone know how to enforce binding to cores on AWS instances?  Any
> insight would be great.
>
>   Thanks,
>   - Brian
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Q: Binding to cores on AWS?

2017-12-22 Thread r...@open-mpi.org
I honestly don’t know - will have to defer to Brian, who is likely out for at 
least the extended weekend. I’ll point this one to him when he returns.


> On Dec 22, 2017, at 1:08 PM, Brian Dobbins  wrote:
> 
> 
>   Hi Ralph,
> 
>   OK, that certainly makes sense - so the next question is, what prevents 
> binding memory to be local to particular cores?  Is this possible in a 
> virtualized environment like AWS HVM instances?
> 
>   And does this apply only to dynamic allocations within an instance, or 
> static as well?  I'm pretty unfamiliar with how the hypervisor (KVM-based, I 
> believe) maps out 'real' hardware, including memory, to particular instances. 
>  We've seen some parts of the code (bandwidth heavy) run ~10x faster on 
> bare-metal hardware, though, presumably from memory locality, so it certainly 
> has a big impact.
> 
>   Thanks again, and merry Christmas!
>   - Brian
> 
> 
> On Fri, Dec 22, 2017 at 1:53 PM, r...@open-mpi.org  
> mailto:r...@open-mpi.org>> wrote:
> Actually, that message is telling you that binding to core is available, but 
> that we cannot bind memory to be local to that core. You can verify the 
> binding pattern by adding --report-bindings to your cmd line.
> 
> 
>> On Dec 22, 2017, at 11:58 AM, Brian Dobbins > > wrote:
>> 
>> 
>> Hi all,
>> 
>>   We're testing a model on AWS using C4/C5 nodes and some of our timers, in 
>> a part of the code with no communication, show really poor performance 
>> compared to native runs.  We think this is because we're not binding to a 
>> core properly and thus not caching, and a quick 'mpirun --bind-to core 
>> hostname' does suggest issues with this on AWS:
>> 
>> [bdobbins@head run]$ mpirun --bind-to core hostname
>> --
>> WARNING: a request was made to bind a process. While the system
>> supports binding the process itself, at least one node does NOT
>> support binding memory to the process location.
>> 
>>   Node:  head
>> 
>> Open MPI uses the "hwloc" library to perform process and memory
>> binding. This error message means that hwloc has indicated that
>> processor binding support is not available on this machine.
>> 
>>   (It also happens on compute nodes, and with real executables.)
>> 
>>   Does anyone know how to enforce binding to cores on AWS instances?  Any 
>> insight would be great.  
>> 
>>   Thanks,
>>   - Brian
>> ___
>> users mailing list
>> users@lists.open-mpi.org 
>> https://lists.open-mpi.org/mailman/listinfo/users 
>> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://lists.open-mpi.org/mailman/listinfo/users 
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Q: Binding to cores on AWS?

2017-12-22 Thread Brian Dobbins
Hi Ralph,

  Well, this gets chalked up to user error - the default AMI images come
without the NUMA-dev libraries, so OpenMPI didn't get built with it (and in
my haste, I hadn't checked).  Oops.  Things seem to be working correctly
now.

  Thanks again for your help,
  - Brian


On Fri, Dec 22, 2017 at 2:14 PM, r...@open-mpi.org  wrote:

> I honestly don’t know - will have to defer to Brian, who is likely out for
> at least the extended weekend. I’ll point this one to him when he returns.
>
>
> On Dec 22, 2017, at 1:08 PM, Brian Dobbins  wrote:
>
>
>   Hi Ralph,
>
>   OK, that certainly makes sense - so the next question is, what prevents
> binding memory to be local to particular cores?  Is this possible in a
> virtualized environment like AWS HVM instances?
>
>   And does this apply only to dynamic allocations within an instance, or
> static as well?  I'm pretty unfamiliar with how the hypervisor (KVM-based,
> I believe) maps out 'real' hardware, including memory, to particular
> instances.  We've seen *some* parts of the code (bandwidth heavy) run
> ~10x faster on bare-metal hardware, though, *presumably* from memory
> locality, so it certainly has a big impact.
>
>   Thanks again, and merry Christmas!
>   - Brian
>
>
> On Fri, Dec 22, 2017 at 1:53 PM, r...@open-mpi.org 
> wrote:
>
>> Actually, that message is telling you that binding to core is available,
>> but that we cannot bind memory to be local to that core. You can verify the
>> binding pattern by adding --report-bindings to your cmd line.
>>
>>
>> On Dec 22, 2017, at 11:58 AM, Brian Dobbins  wrote:
>>
>>
>> Hi all,
>>
>>   We're testing a model on AWS using C4/C5 nodes and some of our timers,
>> in a part of the code with no communication, show really poor performance
>> compared to native runs.  We think this is because we're not binding to a
>> core properly and thus not caching, and a quick 'mpirun --bind-to core
>> hostname' does suggest issues with this on AWS:
>>
>> *[bdobbins@head run]$ mpirun --bind-to core hostname*
>>
>> *--*
>> *WARNING: a request was made to bind a process. While the system*
>> *supports binding the process itself, at least one node does NOT*
>> *support binding memory to the process location.*
>>
>> *  Node:  head*
>>
>> *Open MPI uses the "hwloc" library to perform process and memory*
>> *binding. This error message means that hwloc has indicated that*
>> *processor binding support is not available on this machine.*
>>
>>   (It also happens on compute nodes, and with real executables.)
>>
>>   Does anyone know how to enforce binding to cores on AWS instances?  Any
>> insight would be great.
>>
>>   Thanks,
>>   - Brian
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Q: Binding to cores on AWS?

2017-12-22 Thread Gilles Gouaillardet
Brian,

i have no doubt this was enough to get rid of the warning messages.

out of curiosity, are you now able to experience performances close to
native runs ?
if i understand correctly, the linux kernel allocates memory on the
closest NUMA domain (e.g. socket if i oversimplify), and since
MPI tasks are bound by orted/mpirun before they are execv'ed, i have
some hard time understanding how not binding MPI tasks to
memory can have a significant impact on performances as long as they
are bound on cores.

Cheers,

Gilles


On Sat, Dec 23, 2017 at 7:27 AM, Brian Dobbins  wrote:
>
> Hi Ralph,
>
>   Well, this gets chalked up to user error - the default AMI images come
> without the NUMA-dev libraries, so OpenMPI didn't get built with it (and in
> my haste, I hadn't checked).  Oops.  Things seem to be working correctly
> now.
>
>   Thanks again for your help,
>   - Brian
>
>
> On Fri, Dec 22, 2017 at 2:14 PM, r...@open-mpi.org  wrote:
>>
>> I honestly don’t know - will have to defer to Brian, who is likely out for
>> at least the extended weekend. I’ll point this one to him when he returns.
>>
>>
>> On Dec 22, 2017, at 1:08 PM, Brian Dobbins  wrote:
>>
>>
>>   Hi Ralph,
>>
>>   OK, that certainly makes sense - so the next question is, what prevents
>> binding memory to be local to particular cores?  Is this possible in a
>> virtualized environment like AWS HVM instances?
>>
>>   And does this apply only to dynamic allocations within an instance, or
>> static as well?  I'm pretty unfamiliar with how the hypervisor (KVM-based, I
>> believe) maps out 'real' hardware, including memory, to particular
>> instances.  We've seen some parts of the code (bandwidth heavy) run ~10x
>> faster on bare-metal hardware, though, presumably from memory locality, so
>> it certainly has a big impact.
>>
>>   Thanks again, and merry Christmas!
>>   - Brian
>>
>>
>> On Fri, Dec 22, 2017 at 1:53 PM, r...@open-mpi.org 
>> wrote:
>>>
>>> Actually, that message is telling you that binding to core is available,
>>> but that we cannot bind memory to be local to that core. You can verify the
>>> binding pattern by adding --report-bindings to your cmd line.
>>>
>>>
>>> On Dec 22, 2017, at 11:58 AM, Brian Dobbins  wrote:
>>>
>>>
>>> Hi all,
>>>
>>>   We're testing a model on AWS using C4/C5 nodes and some of our timers,
>>> in a part of the code with no communication, show really poor performance
>>> compared to native runs.  We think this is because we're not binding to a
>>> core properly and thus not caching, and a quick 'mpirun --bind-to core
>>> hostname' does suggest issues with this on AWS:
>>>
>>> [bdobbins@head run]$ mpirun --bind-to core hostname
>>>
>>> --
>>> WARNING: a request was made to bind a process. While the system
>>> supports binding the process itself, at least one node does NOT
>>> support binding memory to the process location.
>>>
>>>   Node:  head
>>>
>>> Open MPI uses the "hwloc" library to perform process and memory
>>> binding. This error message means that hwloc has indicated that
>>> processor binding support is not available on this machine.
>>>
>>>   (It also happens on compute nodes, and with real executables.)
>>>
>>>   Does anyone know how to enforce binding to cores on AWS instances?  Any
>>> insight would be great.
>>>
>>>   Thanks,
>>>   - Brian
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>
>>>
>>>
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Q: Binding to cores on AWS?

2017-12-22 Thread Brian Dobbins
Hi Gilles,

  You're right, we no longer get warnings... and the performance disparity
still exists, though to be clear it's only in select parts of the code -
others run as we'd expect.  This is probably why I initially guessed it was
a process/memory affinity issue - the one timer I looked at is in a
memory-intensive part of the code.  Now I'm wondering if we're still
getting issues binding (I need to do a comparison with a local system), or
if it could be due to the cache size differences - the AWS C4 instances
have 25MB/socket, and we have 45MB/socket.  If we fit in cache on our
system, and don't on theirs, that could account for things.  Testing that
is next up on my list, actually.

  Cheers,
  - Brian


On Fri, Dec 22, 2017 at 7:55 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Brian,
>
> i have no doubt this was enough to get rid of the warning messages.
>
> out of curiosity, are you now able to experience performances close to
> native runs ?
> if i understand correctly, the linux kernel allocates memory on the
> closest NUMA domain (e.g. socket if i oversimplify), and since
> MPI tasks are bound by orted/mpirun before they are execv'ed, i have
> some hard time understanding how not binding MPI tasks to
> memory can have a significant impact on performances as long as they
> are bound on cores.
>
> Cheers,
>
> Gilles
>
>
> On Sat, Dec 23, 2017 at 7:27 AM, Brian Dobbins  wrote:
> >
> > Hi Ralph,
> >
> >   Well, this gets chalked up to user error - the default AMI images come
> > without the NUMA-dev libraries, so OpenMPI didn't get built with it (and
> in
> > my haste, I hadn't checked).  Oops.  Things seem to be working correctly
> > now.
> >
> >   Thanks again for your help,
> >   - Brian
> >
> >
> > On Fri, Dec 22, 2017 at 2:14 PM, r...@open-mpi.org 
> wrote:
> >>
> >> I honestly don’t know - will have to defer to Brian, who is likely out
> for
> >> at least the extended weekend. I’ll point this one to him when he
> returns.
> >>
> >>
> >> On Dec 22, 2017, at 1:08 PM, Brian Dobbins  wrote:
> >>
> >>
> >>   Hi Ralph,
> >>
> >>   OK, that certainly makes sense - so the next question is, what
> prevents
> >> binding memory to be local to particular cores?  Is this possible in a
> >> virtualized environment like AWS HVM instances?
> >>
> >>   And does this apply only to dynamic allocations within an instance, or
> >> static as well?  I'm pretty unfamiliar with how the hypervisor
> (KVM-based, I
> >> believe) maps out 'real' hardware, including memory, to particular
> >> instances.  We've seen some parts of the code (bandwidth heavy) run ~10x
> >> faster on bare-metal hardware, though, presumably from memory locality,
> so
> >> it certainly has a big impact.
> >>
> >>   Thanks again, and merry Christmas!
> >>   - Brian
> >>
> >>
> >> On Fri, Dec 22, 2017 at 1:53 PM, r...@open-mpi.org 
> >> wrote:
> >>>
> >>> Actually, that message is telling you that binding to core is
> available,
> >>> but that we cannot bind memory to be local to that core. You can
> verify the
> >>> binding pattern by adding --report-bindings to your cmd line.
> >>>
> >>>
> >>> On Dec 22, 2017, at 11:58 AM, Brian Dobbins 
> wrote:
> >>>
> >>>
> >>> Hi all,
> >>>
> >>>   We're testing a model on AWS using C4/C5 nodes and some of our
> timers,
> >>> in a part of the code with no communication, show really poor
> performance
> >>> compared to native runs.  We think this is because we're not binding
> to a
> >>> core properly and thus not caching, and a quick 'mpirun --bind-to core
> >>> hostname' does suggest issues with this on AWS:
> >>>
> >>> [bdobbins@head run]$ mpirun --bind-to core hostname
> >>>
> >>> 
> --
> >>> WARNING: a request was made to bind a process. While the system
> >>> supports binding the process itself, at least one node does NOT
> >>> support binding memory to the process location.
> >>>
> >>>   Node:  head
> >>>
> >>> Open MPI uses the "hwloc" library to perform process and memory
> >>> binding. This error message means that hwloc has indicated that
> >>> processor binding support is not available on this machine.
> >>>
> >>>   (It also happens on compute nodes, and with real executables.)
> >>>
> >>>   Does anyone know how to enforce binding to cores on AWS instances?
> Any
> >>> insight would be great.
> >>>
> >>>   Thanks,
> >>>   - Brian
> >>> ___
> >>> users mailing list
> >>> users@lists.open-mpi.org
> >>> https://lists.open-mpi.org/mailman/listinfo/users
> >>>
> >>>
> >>>
> >>> ___
> >>> users mailing list
> >>> users@lists.open-mpi.org
> >>> https://lists.open-mpi.org/mailman/listinfo/users
> >>
> >>
> >> ___
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/users
> >>
> >>
> >>
> >> __