Dear Ralph, thanks for this new hint. Unfortunately I don't see how that would fulfill all my requirements:
I like to have 8 OpenMPI jobs for 2 nodes -> 4 OpenMPI jobs per node -> 2 per socket, each executing one OpenMP job with 5 threads mpirun -np 8 --map-by ppr:4:node:pe=5 ... How can I connect this with the constraint 1 threat per core: [pascal-3-06:14965] ... [B./B./B./B./B./../../../../..][../../../../../../../../../..] [pascal-3-06:14965] ... [../../../../../B./B./B./B./B.][../../../../../../../../../..] [pascal-3-06:14965] ... [../../../../../../../../../..][B./B./B./B./B./../../../../..] [pascal-3-06:14965] ... [../../../../../../../../../..][../../../../../B./B./B./B./B./] [pascal-3-07:21027] ... [B./B./B./B./B./../../../../..][../../../../../../../../../..] [pascal-3-07:21027] ... [../../../../../B./B./B./B./B.][../../../../../../../../../..] [pascal-3-07:21027] ... [../../../../../../../../../..][B./B./B./B./B./../../../../..] [pascal-3-07:21027] ... [../../../../../../../../../..][../../../../../B./B./B./B./B./] Cheers, Ado On 22.04.2017 16:45, r...@open-mpi.org wrote: > Sorry for delayed response. I’m glad that option solved the problem. We’ll > have to look at that configure option - shouldn’t be too hard. > > As for the mapping you requested - no problem! Here’s the cmd line: > > mpirun --map-by ppr:1:core --bind-to hwthread > > Ralph > >> On Apr 19, 2017, at 2:51 AM, Heinz-Ado Arnolds <arno...@mpa-garching.mpg.de> >> wrote: >> >> Dear Ralph, dear Gilles, >> >> thanks a lot for your help! The hints to use ":pe=<n>" and to install >> libnuma have been the keys to solve my problems. >> >> Perhaps it would not be a bad idea to include --enable-libnuma in the >> configure help, and make it a default, so that one has to specify >> --disable-libnuma if he really likes to work without numactl. The option is >> already checked in configure (framework in >> opal/mca/hwloc/hwloc1112/hwloc/config/hwloc.m4). >> >> One qestion remains: I now get a binding like >> [pascal-3-06:03036] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket >> 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], >> socket 0[core 4[hwt 0-1]]: >> [BB/BB/BB/BB/BB/../../../../..][../../../../../../../../../..] >> and OpenMP uses just "hwt 0" of each core, what is very welcome. But is >> there a way to get a binding like >> [pascal-3-06:03036] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket >> 0[core 4[hwt 0]]: >> [B./B./B./B./B./../../../../..][../../../../../../../../../..] >> from OpenMPI directly? >> >> Cheers and thanks again, >> >> Ado >> >> On 13.04.2017 17:34, r...@open-mpi.org wrote: >>> Yeah, we need libnuma to set the memory binding. There is a param to turn >>> off the warning if installing libnuma is problematic, but it helps your >>> performance if the memory is kept local to the proc >>> >>>> On Apr 13, 2017, at 8:26 AM, Heinz-Ado Arnolds >>>> <arno...@mpa-garching.mpg.de> wrote: >>>> >>>> Dear Ralph, >>>> >>>> thanks a lot for this valuable advice. Binding now works like expected! >>>> >>>> Since adding the ":pe=" option I'm getting warnings >>>> >>>> WARNING: a request was made to bind a process. While the system >>>> supports binding the process itself, at least one node does NOT >>>> support binding memory to the process location. >>>> >>>> Node: pascal-1-05 >>>> ... >>>> >>>> even if I choose parameters so that binding is like exactly as before >>>> without ":pe=". I don't have libnuma installed on the cluster. Might that >>>> really be the cause of the warning? >>>> >>>> Thanks a lot, have a nice Easter days >>>> >>>> Ado >>>> >>>> On 13.04.2017 15:49, r...@open-mpi.org wrote: >>>>> You can always specify a particular number of cpus to use for each >>>>> process by adding it to the map-by directive: >>>>> >>>>> mpirun -np 8 --map-by ppr:2:socket:pe=5 --use-hwthread-cpus >>>>> -report-bindings --mca plm_rsh_agent "qrsh" ./myid >>>>> >>>>> would map 2 processes to each socket, binding each process to 5 HTs on >>>>> that socket (since you told us to treat HTs as independent cpus). If you >>>>> want us to bind to you 5 cores, then you need to remove that >>>>> --use-hwthread-cpus directive. >>>>> >>>>> As I said earlier in this thread, we are actively working with the OpenMP >>>>> folks on a mechanism by which the two sides can coordinate these actions >>>>> so it will be easier to get the desired behavior. For now, though, >>>>> hopefully this will suffice. >>>>> >>>>>> On Apr 13, 2017, at 6:31 AM, Heinz-Ado Arnolds >>>>>> <arno...@mpa-garching.mpg.de> wrote: >>>>>> >>>>>> On 13.04.2017 15:20, gil...@rist.or.jp wrote: >>>>>> ... >>>>>>> in your second case, there are 2 things >>>>>>> - MPI binds to socket, that is why two MPI tasks are assigned the same >>>>>>> hyperthreads >>>>>>> - the GNU OpenMP runtime looks unable to figure out 2 processes use the >>>>>>> same cores, and hence end up binding >>>>>>> the OpenMP threads to the same cores. >>>>>>> my best bet is you should bind a MPI tasks to 5 cores instead of one >>>>>>> socket. >>>>>>> i do not know the syntax off hand, and i am sure Ralph will help you >>>>>>> with that >>>>>> >>>>>> Thanks, would be great if someone has that syntax. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Ado >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> users@lists.open-mpi.org >>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.open-mpi.org >>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>>> >>>> >>> >>> >> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > >
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users