Hello Ralph,

I need to support several different apps, each app with entirely different MPI 
communication
needs, and each app either single-threaded or multi-threaded.  For example, one 
app tends
to do very little message passing, and another app does much more message 
passing.

And some of our end users are very performance-conscious, so we want to give 
our end
users tools for controlling performance.  And of course all of our end users 
will be running on
different hardware.

So I wanted to do some benchmarking of the affinity options, in order to give 
some guidelines
to our end users.  My understanding is that it is necessary to actually try the 
different affinity
options, and that it is very difficult, if not impossible, to predict which 
affinity options, if any,
gives a performance benefit beforehand.

It is quite possible that our apps would work better with

MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.]

instead of

MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B]

but again there is no way to know this beforehand.  It is nice to have the 
option to try both,
which we could do in Open MPI 1.4.3.

Our apps all use app context files.  App context files are very convenient 
since we can pass
different options to the executable for each rank, in particular, the pathname 
of the working
directory that each rank uses.  And the app context files are very readable, 
since everything
is not on one long mpirun command line.

So for us it is important to have all of the affinity parameters work with the 
app context files.

I tried an app context file of the format

> -np 1 afftest01.exe; -np 1 afftest01.exe

but it didn't work. Only rank 0 was created. Is there a different syntax that 
will work?

Sincerely,

Ted Sussman




 I must say that I am surpi
On 30 Jun 2017 at 7:41, r...@open-mpi.org wrote:

> Well, yes and no. Yes, your cpu loads will balance better across nodes 
> (balancing across sockets doesn´t do much for you). However, your overall 
> application performance may be the poorest in that arrangement if your app 
> uses a lot of communication as the layout minimizes the use of shared memory.
>
> Laying out an app requires a little thought about its characteristics. If it 
> is mostly compute with a little communication, then spreading the procs out 
> makes the most sense. If it has a lot of communication, then compressing the 
> procs into the minimum space makes the most sense. This is the most commonly 
> used layout.
>
> I haven´t looked at app context files in ages, but I think you could try this:
>
> -np 1 afftest01.exe; -np 1 afftest01.exe
>
>
> > On Jun 30, 2017, at 5:03 AM, Ted Sussman <ted.suss...@adina.com> wrote:
> >
> > Hello Ralph,
> >
> > Thank you for your comments.
> >
> > My understanding, from reading Jeff's blog on V1.5 processor affinity, is 
> > that the bindings in
> > Example 1 balance the load better than the bindings in Example 2.
> >
> > Therefore I would like to obtain the bindings in Example 1, but using Open 
> > MPI 2.1.1, and
> > using application context files.
> >
> > How can I do this?
> >
> > Sincerely,
> >
> > Ted Sussman
> >
> > On 29 Jun 2017 at 19:09, r...@open-mpi.org wrote:
> >
> >>
> >> It´s a difficult call to make as to which is the correct behavior. In 
> >> Example 1, you are executing a
> >> single app_context that has two procs in it. In Example 2, you are 
> >> executing two app_contexts,
> >> each with a single proc in it.
> >>
> >> Now some people say that the two should be treated the same, with the 
> >> second app_context in
> >> Example 2 being mapped starting from the end of the first app_context. In 
> >> this model, a
> >> comm_spawn would also start from the end of the earlier app_context, and 
> >> thus the new proc
> >> would not be on the same node (or socket, in this case) as its parent.
> >>
> >> Other people argue for the opposite behavior - that each app_context 
> >> should start from the first
> >> available slot in the allocation. In that model, a comm_spawn would result 
> >> in the first child
> >> occupying the same node (or socket) as its parent, assuming an available 
> >> slot.
> >>
> >> We´ve bounced around a bit on the behavior over the years as different 
> >> groups voiced their
> >> opinions. OMPI 1.4.3 is _very_ old and fell in the prior camp, while 2.1.1 
> >> is just released and is in
> >> the second camp. I honestly don´t recall where the change occurred, or 
> >> even how consistent we
> >> have necessarily been over the years. It isn´t something that people raise 
> >> very often.
> >>
> >> I´ve pretty much resolved to leave the default behavior as it currently 
> >> sits, but plan to add an option
> >> to support the alternative behavior as there seems no clear cut consensus 
> >> in the user community
> >> for this behavior. Not sure when I´ll get to it - definitely not for the 
> >> 2.x series, and maybe not for 3.x
> >> since that is about to be released.
> >>
> >>    On Jun 29, 2017, at 11:24 AM, Ted Sussman <ted.suss...@adina.com> wrote:
> >>
> >>    Hello all,
> >>
> >>    Today I have a problem with the --map-to socket feature of Open MPI 
> >> 2.1.1 when used with
> >>    application context files.
> >>
> >>    In the examples below, I am testing on a 2 socket computer, each socket 
> >> with 4 cores.
> >>
> >>    ---
> >>
> >>    Example 1:
> >>
> >>    .../openmpi-2.1.1/bin/mpirun --report-bindings \
> >>                -map-by socket \
> >>                -np 2 \
> >>                afftest01.exe
> >>
> >>    returns
> >>
> >>    ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> >>    ...MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B]
> >>
> >>    which is what I would expect.
> >>
> >>    ---
> >>
> >>    Example 2:
> >>
> >>    Create appfile as:
> >>
> >>    -np 1 afftest01.exe
> >>    -np 1 afftest01.exe
> >>
> >>    Then
> >>
> >>    .../openmpi-2.1.1/bin/mpirun --report-bindings \
> >>                -map-by socket \
> >>                -app appfile
> >>
> >>    returns
> >>
> >>    ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> >>    ...MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.]
> >>
> >>    which is not what I expect. I expect the same bindings as in Example 1.
> >>
> >>    ---
> >>
> >>    Example 3:
> >>
> >>    Using the same appfile as in Example 2,
> >>
> >>    .../openmpi-1.4.3/bin/mpirun --report-bindings \
> >>                -bysocket --bind-to-core  \
> >>                -app appfile
> >>
> >>    returns
> >>
> >>    ... odls:default:fork binding child ... to socket 0 cpus 0002
> >>    ... odls:default:fork binding child ... to socket 1 cpus 0001
> >>
> >>    which is what I would expect.  Here I use --bind-to-core just to get 
> >> the bindings printed.
> >>
> >>    ---
> >>
> >>    The examples show that the --map-by socket feature does not work as 
> >> expected when
> >>    application context files are used.  However the older -bysocket 
> >> feature worked as expected
> >>    in OpenMPI 1.4.3 when application context files are used.
> >>
> >>    If I am using the wrong syntax in Example 2, please let me know.
> >>
> >>    Sincerely,
> >>
> >>    Ted Sussman
> >>
> >>
> >>    _______________________________________________
> >>    users mailing list
> >>    users@lists.open-mpi.org
> >>    https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to