Well, FWIW, it looks like master (and hence 3.0) behave the way you wanted:
$ mpirun -map-by socket --report-bindings --app ./appfile [rhc001:48492] MCW rank 0: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..] [rhc001:48492] MCW rank 1: [../../../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] $ with an appfile of -n 1 hostname -n 1 hostname Interestingly, a "--map-by node" directive still winds up with the procs filling the first node before moving to the second. Not entirely sure what’s going on there. Ditto when I add the “span” qualifier (--map-by socket:span) - all the procs stay on the first node until full, which isn’t the expected behavior. I very much doubt we’d backport the code supporting this stuff to the v2.x series, so perhaps upgrade to 3.0 when it gets released in the very near future? As I said, this has gone back/forth too many times, so I’m going to “freeze” it at the 3.0 behavior (perhaps exploring why --map-by node and span qualifier aren’t doing the expected for multiple app_contexts) and add the option from there. > On Jun 30, 2017, at 8:42 AM, Ted Sussman <ted.suss...@adina.com> wrote: > > Hello Ralph, > > I need to support several different apps, each app with entirely different > MPI communication needs, and each app either single-threaded or > multi-threaded. For example, one app tends to do very little message > passing, and another app does much more message passing. > > And some of our end users are very performance-conscious, so we want to give > our end users tools for controlling performance. And of course all of our > end users will be running on different hardware. > > So I wanted to do some benchmarking of the affinity options, in order to give > some guidelines to our end users. My understanding is that it is necessary > to actually try the different affinity options, and that it is very > difficult, if not impossible, to predict which affinity options, if any, > gives a performance benefit beforehand. > > It is quite possible that our apps would work better with > > MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.] > MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.] > > instead of > > MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.] > MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B] > > but again there is no way to know this beforehand. It is nice to have the > option to try both, which we could do in Open MPI 1.4.3. > > Our apps all use app context files. App context files are very convenient > since we can pass different options to the executable for each rank, in > particular, the pathname of the working directory that each rank uses. And > the app context files are very readable, since everything is not on one long > mpirun command line. > > So for us it is important to have all of the affinity parameters work with > the app context files. > > I tried an app context file of the format > > > -np 1 afftest01.exe; -np 1 afftest01.exe > > but it didn't work. Only rank 0 was created. Is there a different syntax that > will work? > > Sincerely, > > Ted Sussman > > > > > I must say that I am surpi > On 30 Jun 2017 at 7:41, r...@open-mpi.org wrote: > > > Well, yes and no. Yes, your cpu loads will balance better across nodes > > (balancing across sockets doesn’t do much for you). However, your overall > > application performance may be the poorest in that arrangement if your app > > uses a lot of communication as the layout minimizes the use of shared > > memory. > > > > Laying out an app requires a little thought about its characteristics. If > > it is mostly compute with a little communication, then spreading the procs > > out makes the most sense. If it has a lot of communication, then > > compressing the procs into the minimum space makes the most sense. This is > > the most commonly used layout. > > > > I haven’t looked at app context files in ages, but I think you could try > > this: > > > > -np 1 afftest01.exe; -np 1 afftest01.exe > > > > > > > On Jun 30, 2017, at 5:03 AM, Ted Sussman <ted.suss...@adina.com> wrote: > > > > > > Hello Ralph, > > > > > > Thank you for your comments. > > > > > > My understanding, from reading Jeff's blog on V1.5 processor affinity, is > > > that the bindings in > > > Example 1 balance the load better than the bindings in Example 2. > > > > > > Therefore I would like to obtain the bindings in Example 1, but using > > > Open MPI 2.1.1, and > > > using application context files. > > > > > > How can I do this? > > > > > > Sincerely, > > > > > > Ted Sussman > > > > > > On 29 Jun 2017 at 19:09, r...@open-mpi.org wrote: > > > > > >> > > >> It´s a difficult call to make as to which is the correct behavior. In > > >> Example 1, you are executing a > > >> single app_context that has two procs in it. In Example 2, you are > > >> executing two app_contexts, > > >> each with a single proc in it. > > >> > > >> Now some people say that the two should be treated the same, with the > > >> second app_context in > > >> Example 2 being mapped starting from the end of the first app_context. > > >> In this model, a > > >> comm_spawn would also start from the end of the earlier app_context, and > > >> thus the new proc > > >> would not be on the same node (or socket, in this case) as its parent. > > >> > > >> Other people argue for the opposite behavior - that each app_context > > >> should start from the first > > >> available slot in the allocation. In that model, a comm_spawn would > > >> result in the first child > > >> occupying the same node (or socket) as its parent, assuming an available > > >> slot. > > >> > > >> We´ve bounced around a bit on the behavior over the years as different > > >> groups voiced their > > >> opinions. OMPI 1.4.3 is _very_ old and fell in the prior camp, while > > >> 2.1.1 is just released and is in > > >> the second camp. I honestly don´t recall where the change occurred, or > > >> even how consistent we > > >> have necessarily been over the years. It isn´t something that people > > >> raise very often. > > >> > > >> I´ve pretty much resolved to leave the default behavior as it currently > > >> sits, but plan to add an option > > >> to support the alternative behavior as there seems no clear cut > > >> consensus in the user community > > >> for this behavior. Not sure when I´ll get to it - definitely not for the > > >> 2.x series, and maybe not for 3.x > > >> since that is about to be released. > > >> > > >> On Jun 29, 2017, at 11:24 AM, Ted Sussman <ted.suss...@adina.com> > > >> wrote: > > >> > > >> Hello all, > > >> > > >> Today I have a problem with the --map-to socket feature of Open MPI > > >> 2.1.1 when used with > > >> application context files. > > >> > > >> In the examples below, I am testing on a 2 socket computer, each > > >> socket with 4 cores. > > >> > > >> --- > > >> > > >> Example 1: > > >> > > >> .../openmpi-2.1.1/bin/mpirun --report-bindings \ > > >> -map-by socket \ > > >> -np 2 \ > > >> afftest01.exe > > >> > > >> returns > > >> > > >> ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.] > > >> ...MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B] > > >> > > >> which is what I would expect. > > >> > > >> --- > > >> > > >> Example 2: > > >> > > >> Create appfile as: > > >> > > >> -np 1 afftest01.exe > > >> -np 1 afftest01.exe > > >> > > >> Then > > >> > > >> .../openmpi-2.1.1/bin/mpirun --report-bindings \ > > >> -map-by socket \ > > >> -app appfile > > >> > > >> returns > > >> > > >> ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.] > > >> ...MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.] > > >> > > >> which is not what I expect. I expect the same bindings as in Example > > >> 1. > > >> > > >> --- > > >> > > >> Example 3: > > >> > > >> Using the same appfile as in Example 2, > > >> > > >> .../openmpi-1.4.3/bin/mpirun --report-bindings \ > > >> -bysocket --bind-to-core \ > > >> -app appfile > > >> > > >> returns > > >> > > >> ... odls:default:fork binding child ... to socket 0 cpus 0002 > > >> ... odls:default:fork binding child ... to socket 1 cpus 0001 > > >> > > >> which is what I would expect. Here I use --bind-to-core just to get > > >> the bindings printed. > > >> > > >> --- > > >> > > >> The examples show that the --map-by socket feature does not work as > > >> expected when > > >> application context files are used. However the older -bysocket > > >> feature worked as expected > > >> in OpenMPI 1.4.3 when application context files are used. > > >> > > >> If I am using the wrong syntax in Example 2, please let me know. > > >> > > >> Sincerely, > > >> > > >> Ted Sussman > > >> > > >> > > >> _______________________________________________ > > >> users mailing list > > >> users@lists.open-mpi.org > > >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > >> > > > > > > > > > > > > _______________________________________________ > > > users mailing list > > > users@lists.open-mpi.org > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users