Re: [OMPI users] OpenMPI 2.1.1, --map-to socket, application context files

r...@open-mpi.org Fri, 30 Jun 2017 12:22:07 -0700

Well, FWIW, it looks like master (and hence 3.0) behave the way you wanted:


$ mpirun -map-by socket --report-bindings --app ./appfile
[rhc001:48492] MCW rank 0: 
[BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..]
[rhc001:48492] MCW rank 1: 
[../../../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
$

with an appfile of
-n 1 hostname
-n 1 hostname

Interestingly, a "--map-by node" directive still winds up with the procs 
filling the first node before moving to the second. Not entirely sure what’s 
going on there. Ditto when I add the “span” qualifier (--map-by socket:span) - 
all the procs stay on the first node until full, which isn’t the expected 
behavior.

I very much doubt we’d backport the code supporting this stuff to the v2.x 
series, so perhaps upgrade to 3.0 when it gets released in the very near future?

As I said, this has gone back/forth too many times, so I’m going to “freeze” it 
at the 3.0 behavior (perhaps exploring why --map-by node and span qualifier 
aren’t doing the expected for multiple app_contexts) and add the option from 
there.


> On Jun 30, 2017, at 8:42 AM, Ted Sussman <ted.suss...@adina.com> wrote:
> 
> Hello Ralph,
> 
> I need to support several different apps, each app with entirely different 
> MPI communication needs, and each app either single-threaded or 
> multi-threaded.  For example, one app tends to do very little message 
> passing, and another app does much more message passing.
> 
> And some of our end users are very performance-conscious, so we want to give 
> our end users tools for controlling performance.  And of course all of our 
> end users will be running on different hardware. 
> 
> So I wanted to do some benchmarking of the affinity options, in order to give 
> some guidelines to our end users.  My understanding is that it is necessary 
> to actually try the different affinity options, and that it is very 
> difficult, if not impossible, to predict which affinity options, if any, 
> gives a performance benefit beforehand.
> 
> It is quite possible that our apps would work better with
> 
> MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.]
> 
> instead of
> 
> MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B]
> 
> but again there is no way to know this beforehand.  It is nice to have the 
> option to try both, which we could do in Open MPI 1.4.3.
> 
> Our apps all use app context files.  App context files are very convenient 
> since we can pass different options to the executable for each rank, in 
> particular, the pathname of the working directory that each rank uses.  And 
> the app context files are very readable, since everything is not on one long 
> mpirun command line.
> 
> So for us it is important to have all of the affinity parameters work with 
> the app context files.
> 
> I tried an app context file of the format
> 
> > -np 1 afftest01.exe; -np 1 afftest01.exe
> 
> but it didn't work. Only rank 0 was created. Is there a different syntax that 
> will work?
> 
> Sincerely,
> 
> Ted Sussman
> 
> 
> 
> 
>  I must say that I am surpi
> On 30 Jun 2017 at 7:41, r...@open-mpi.org wrote:
> 
> > Well, yes and no. Yes, your cpu loads will balance better across nodes 
> > (balancing across sockets doesn’t do much for you). However, your overall 
> > application performance may be the poorest in that arrangement if your app 
> > uses a lot of communication as the layout minimizes the use of shared 
> > memory.
> >
> > Laying out an app requires a little thought about its characteristics. If 
> > it is mostly compute with a little communication, then spreading the procs 
> > out makes the most sense. If it has a lot of communication, then 
> > compressing the procs into the minimum space makes the most sense. This is 
> > the most commonly used layout.
> >
> > I haven’t looked at app context files in ages, but I think you could try 
> > this:
> >
> > -np 1 afftest01.exe; -np 1 afftest01.exe
> >
> >
> > > On Jun 30, 2017, at 5:03 AM, Ted Sussman <ted.suss...@adina.com> wrote:
> > >
> > > Hello Ralph,
> > >
> > > Thank you for your comments.
> > >
> > > My understanding, from reading Jeff's blog on V1.5 processor affinity, is 
> > > that the bindings in
> > > Example 1 balance the load better than the bindings in Example 2.
> > >
> > > Therefore I would like to obtain the bindings in Example 1, but using 
> > > Open MPI 2.1.1, and
> > > using application context files.
> > >
> > > How can I do this?
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > > On 29 Jun 2017 at 19:09, r...@open-mpi.org wrote:
> > >
> > >>
> > >> It´s a difficult call to make as to which is the correct behavior. In 
> > >> Example 1, you are executing a
> > >> single app_context that has two procs in it. In Example 2, you are 
> > >> executing two app_contexts,
> > >> each with a single proc in it.
> > >>
> > >> Now some people say that the two should be treated the same, with the 
> > >> second app_context in
> > >> Example 2 being mapped starting from the end of the first app_context. 
> > >> In this model, a
> > >> comm_spawn would also start from the end of the earlier app_context, and 
> > >> thus the new proc
> > >> would not be on the same node (or socket, in this case) as its parent.
> > >>
> > >> Other people argue for the opposite behavior - that each app_context 
> > >> should start from the first
> > >> available slot in the allocation. In that model, a comm_spawn would 
> > >> result in the first child
> > >> occupying the same node (or socket) as its parent, assuming an available 
> > >> slot.
> > >>
> > >> We´ve bounced around a bit on the behavior over the years as different 
> > >> groups voiced their
> > >> opinions. OMPI 1.4.3 is _very_ old and fell in the prior camp, while 
> > >> 2.1.1 is just released and is in
> > >> the second camp. I honestly don´t recall where the change occurred, or 
> > >> even how consistent we
> > >> have necessarily been over the years. It isn´t something that people 
> > >> raise very often.
> > >>
> > >> I´ve pretty much resolved to leave the default behavior as it currently 
> > >> sits, but plan to add an option
> > >> to support the alternative behavior as there seems no clear cut 
> > >> consensus in the user community
> > >> for this behavior. Not sure when I´ll get to it - definitely not for the 
> > >> 2.x series, and maybe not for 3.x
> > >> since that is about to be released.
> > >>
> > >>    On Jun 29, 2017, at 11:24 AM, Ted Sussman <ted.suss...@adina.com> 
> > >> wrote:
> > >>
> > >>    Hello all,
> > >>
> > >>    Today I have a problem with the --map-to socket feature of Open MPI 
> > >> 2.1.1 when used with
> > >>    application context files.
> > >>
> > >>    In the examples below, I am testing on a 2 socket computer, each 
> > >> socket with 4 cores.
> > >>
> > >>    ---
> > >>
> > >>    Example 1:
> > >>
> > >>    .../openmpi-2.1.1/bin/mpirun --report-bindings \
> > >>                -map-by socket \
> > >>                -np 2 \
> > >>                afftest01.exe
> > >>
> > >>    returns
> > >>
> > >>    ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> > >>    ...MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B]
> > >>
> > >>    which is what I would expect.
> > >>
> > >>    ---
> > >>
> > >>    Example 2:
> > >>
> > >>    Create appfile as:
> > >>
> > >>    -np 1 afftest01.exe
> > >>    -np 1 afftest01.exe
> > >>
> > >>    Then
> > >>
> > >>    .../openmpi-2.1.1/bin/mpirun --report-bindings \
> > >>                -map-by socket \
> > >>                -app appfile
> > >>
> > >>    returns
> > >>
> > >>    ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> > >>    ...MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.]
> > >>
> > >>    which is not what I expect. I expect the same bindings as in Example 
> > >> 1.
> > >>
> > >>    ---
> > >>
> > >>    Example 3:
> > >>
> > >>    Using the same appfile as in Example 2,
> > >>
> > >>    .../openmpi-1.4.3/bin/mpirun --report-bindings \
> > >>                -bysocket --bind-to-core  \
> > >>                -app appfile
> > >>
> > >>    returns
> > >>
> > >>    ... odls:default:fork binding child ... to socket 0 cpus 0002
> > >>    ... odls:default:fork binding child ... to socket 1 cpus 0001
> > >>
> > >>    which is what I would expect.  Here I use --bind-to-core just to get 
> > >> the bindings printed.
> > >>
> > >>    ---
> > >>
> > >>    The examples show that the --map-by socket feature does not work as 
> > >> expected when
> > >>    application context files are used.  However the older -bysocket 
> > >> feature worked as expected
> > >>    in OpenMPI 1.4.3 when application context files are used.
> > >>
> > >>    If I am using the wrong syntax in Example 2, please let me know.
> > >>
> > >>    Sincerely,
> > >>
> > >>    Ted Sussman
> > >>
> > >>     
> > >>    _______________________________________________
> > >>    users mailing list
> > >>    users@lists.open-mpi.org
> > >>    https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >>
> > >
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > users@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
>   
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI 2.1.1, --map-to socket, application context files

Reply via email to