Dear OpenMPI users & developers,

I'm trying to distribute my jobs (with SGE) to a machine with a certain number 
of nodes, each node having 2 sockets, each socket having 10 cores & 10 
hyperthreads. I like to use only the real cores, no hyperthreading.

lscpu -a -e

CPU NODE SOCKET CORE L1d:L1i:L2:L3
0   0    0      0    0:0:0:0      
1   1    1      1    1:1:1:1      
2   0    0      2    2:2:2:0      
3   1    1      3    3:3:3:1      
4   0    0      4    4:4:4:0      
5   1    1      5    5:5:5:1      
6   0    0      6    6:6:6:0      
7   1    1      7    7:7:7:1      
8   0    0      8    8:8:8:0      
9   1    1      9    9:9:9:1      
10  0    0      10   10:10:10:0   
11  1    1      11   11:11:11:1   
12  0    0      12   12:12:12:0   
13  1    1      13   13:13:13:1   
14  0    0      14   14:14:14:0   
15  1    1      15   15:15:15:1   
16  0    0      16   16:16:16:0   
17  1    1      17   17:17:17:1   
18  0    0      18   18:18:18:0   
19  1    1      19   19:19:19:1   
20  0    0      0    0:0:0:0      
21  1    1      1    1:1:1:1      
22  0    0      2    2:2:2:0      
23  1    1      3    3:3:3:1      
24  0    0      4    4:4:4:0      
25  1    1      5    5:5:5:1      
26  0    0      6    6:6:6:0      
27  1    1      7    7:7:7:1      
28  0    0      8    8:8:8:0      
29  1    1      9    9:9:9:1      
30  0    0      10   10:10:10:0   
31  1    1      11   11:11:11:1   
32  0    0      12   12:12:12:0   
33  1    1      13   13:13:13:1   
34  0    0      14   14:14:14:0   
35  1    1      15   15:15:15:1   
36  0    0      16   16:16:16:0   
37  1    1      17   17:17:17:1   
38  0    0      18   18:18:18:0   
39  1    1      19   19:19:19:1   

How do I have to choose the options & parameters of mpirun to achieve this 
behavior?

mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings 
./myid

distributes to

[pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 
0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], 
socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 
0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 
9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 
1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], 
socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 
0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 
19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 
0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], 
socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 
0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 
9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 
1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], 
socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 
0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 
19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, 
Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, 
Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, 
Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, 
Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39

i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but uses all 
hwthreads

I have tried several combinations of --use-hwthread-cpus, --bind-to hwthreads, 
but didn't find the right combination.

Would be great to get any hints?

Thank a lot in advance,

Heinz-Ado Arnolds
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to