Dear Ralph,

thanks for this new hint. Unfortunately I don't see how that would fulfill all 
my requirements:

I like to have 8 OpenMPI jobs for 2 nodes -> 4 OpenMPI jobs per node -> 2 per 
socket, each executing one OpenMP job with 5 threads

   mpirun -np 8 --map-by ppr:4:node:pe=5 ...

How can I connect this with the constraint 1 threat per core:

   [pascal-3-06:14965] ... 
[B./B./B./B./B./../../../../..][../../../../../../../../../..]
   [pascal-3-06:14965] ... 
[../../../../../B./B./B./B./B.][../../../../../../../../../..]
   [pascal-3-06:14965] ... 
[../../../../../../../../../..][B./B./B./B./B./../../../../..]
   [pascal-3-06:14965] ... 
[../../../../../../../../../..][../../../../../B./B./B./B./B./]
   [pascal-3-07:21027] ... 
[B./B./B./B./B./../../../../..][../../../../../../../../../..]
   [pascal-3-07:21027] ... 
[../../../../../B./B./B./B./B.][../../../../../../../../../..]
   [pascal-3-07:21027] ... 
[../../../../../../../../../..][B./B./B./B./B./../../../../..]
   [pascal-3-07:21027] ... 
[../../../../../../../../../..][../../../../../B./B./B./B./B./]

Cheers,

Ado

On 22.04.2017 16:45, r...@open-mpi.org wrote:
> Sorry for delayed response. I’m glad that option solved the problem. We’ll 
> have to look at that configure option - shouldn’t be too hard.
> 
> As for the mapping you requested - no problem! Here’s the cmd line:
> 
> mpirun --map-by ppr:1:core --bind-to hwthread
> 
> Ralph
> 
>> On Apr 19, 2017, at 2:51 AM, Heinz-Ado Arnolds <arno...@mpa-garching.mpg.de> 
>> wrote:
>>
>> Dear Ralph, dear Gilles,
>>
>> thanks a lot for your help! The hints to use ":pe=<n>" and to install 
>> libnuma have been the keys to solve my problems.
>>
>> Perhaps it would not be a bad idea to include --enable-libnuma in the 
>> configure help, and make it a default, so that one has to specify 
>> --disable-libnuma if he really likes to work without numactl. The option is 
>> already checked in configure (framework in 
>> opal/mca/hwloc/hwloc1112/hwloc/config/hwloc.m4).
>>
>> One qestion remains: I now get a binding like
>>  [pascal-3-06:03036] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 
>> 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], 
>> socket 0[core 4[hwt 0-1]]: 
>> [BB/BB/BB/BB/BB/../../../../..][../../../../../../../../../..]
>> and OpenMP uses just "hwt 0" of each core, what is very welcome. But is 
>> there a way to get a binding like
>>  [pascal-3-06:03036] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 
>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 
>> 0[core 4[hwt 0]]: 
>> [B./B./B./B./B./../../../../..][../../../../../../../../../..]
>> from OpenMPI directly?
>>
>> Cheers and thanks again,
>>
>> Ado
>>
>> On 13.04.2017 17:34, r...@open-mpi.org wrote:
>>> Yeah, we need libnuma to set the memory binding. There is a param to turn 
>>> off the warning if installing libnuma is problematic, but it helps your 
>>> performance if the memory is kept local to the proc
>>>
>>>> On Apr 13, 2017, at 8:26 AM, Heinz-Ado Arnolds 
>>>> <arno...@mpa-garching.mpg.de> wrote:
>>>>
>>>> Dear Ralph,
>>>>
>>>> thanks a lot for this valuable advice. Binding now works like expected!
>>>>
>>>> Since adding the ":pe=" option I'm getting warnings
>>>>
>>>> WARNING: a request was made to bind a process. While the system
>>>> supports binding the process itself, at least one node does NOT
>>>> support binding memory to the process location.
>>>>
>>>>    Node:  pascal-1-05
>>>> ...
>>>>
>>>> even if I choose parameters so that binding is like exactly as before 
>>>> without ":pe=". I don't have libnuma installed on the cluster. Might that 
>>>> really be the cause of the warning?
>>>>
>>>> Thanks a lot, have a nice Easter days
>>>>
>>>> Ado
>>>>
>>>> On 13.04.2017 15:49, r...@open-mpi.org wrote:
>>>>> You can always specify a particular number of cpus to use for each 
>>>>> process by adding it to the map-by directive:
>>>>>
>>>>> mpirun -np 8 --map-by ppr:2:socket:pe=5 --use-hwthread-cpus 
>>>>> -report-bindings --mca plm_rsh_agent "qrsh" ./myid
>>>>>
>>>>> would map 2 processes to each socket, binding each process to 5 HTs on 
>>>>> that socket (since you told us to treat HTs as independent cpus). If you 
>>>>> want us to bind to you 5 cores, then you need to remove that 
>>>>> --use-hwthread-cpus directive.
>>>>>
>>>>> As I said earlier in this thread, we are actively working with the OpenMP 
>>>>> folks on a mechanism by which the two sides can coordinate these actions 
>>>>> so it will be easier to get the desired behavior. For now, though, 
>>>>> hopefully this will suffice.
>>>>>
>>>>>> On Apr 13, 2017, at 6:31 AM, Heinz-Ado Arnolds 
>>>>>> <arno...@mpa-garching.mpg.de> wrote:
>>>>>>
>>>>>> On 13.04.2017 15:20, gil...@rist.or.jp wrote:
>>>>>> ...
>>>>>>> in your second case, there are 2 things
>>>>>>> - MPI binds to socket, that is why two MPI tasks are assigned the same 
>>>>>>> hyperthreads
>>>>>>> - the GNU OpenMP runtime looks unable to figure out 2 processes use the 
>>>>>>> same cores, and hence end up binding
>>>>>>> the OpenMP threads to the same cores.
>>>>>>> my best bet is you should bind a MPI tasks to 5 cores instead of one 
>>>>>>> socket.
>>>>>>> i do not know the syntax off hand, and i am sure Ralph will help you 
>>>>>>> with that
>>>>>>
>>>>>> Thanks, would be great if someone has that syntax.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Ado
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users@lists.open-mpi.org
>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@lists.open-mpi.org
>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>
>>>>
>>>
>>>
>>
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> 

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to