Dear OpenMPI users & developers, I'm trying to distribute my jobs (with SGE) to a machine with a certain number of nodes, each node having 2 sockets, each socket having 10 cores & 10 hyperthreads. I like to use only the real cores, no hyperthreading.
lscpu -a -e CPU NODE SOCKET CORE L1d:L1i:L2:L3 0 0 0 0 0:0:0:0 1 1 1 1 1:1:1:1 2 0 0 2 2:2:2:0 3 1 1 3 3:3:3:1 4 0 0 4 4:4:4:0 5 1 1 5 5:5:5:1 6 0 0 6 6:6:6:0 7 1 1 7 7:7:7:1 8 0 0 8 8:8:8:0 9 1 1 9 9:9:9:1 10 0 0 10 10:10:10:0 11 1 1 11 11:11:11:1 12 0 0 12 12:12:12:0 13 1 1 13 13:13:13:1 14 0 0 14 14:14:14:0 15 1 1 15 15:15:15:1 16 0 0 16 16:16:16:0 17 1 1 17 17:17:17:1 18 0 0 18 18:18:18:0 19 1 1 19 19:19:19:1 20 0 0 0 0:0:0:0 21 1 1 1 1:1:1:1 22 0 0 2 2:2:2:0 23 1 1 3 3:3:3:1 24 0 0 4 4:4:4:0 25 1 1 5 5:5:5:1 26 0 0 6 6:6:6:0 27 1 1 7 7:7:7:1 28 0 0 8 8:8:8:0 29 1 1 9 9:9:9:1 30 0 0 10 10:10:10:0 31 1 1 11 11:11:11:1 32 0 0 12 12:12:12:0 33 1 1 13 13:13:13:1 34 0 0 14 14:14:14:0 35 1 1 15 15:15:15:1 36 0 0 16 16:16:16:0 37 1 1 17 17:17:17:1 38 0 0 18 18:18:18:0 39 1 1 19 19:19:19:1 How do I have to choose the options & parameters of mpirun to achieve this behavior? mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings ./myid distributes to [pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] [pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] [pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] [pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] MPI Instance 0001 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 MPI Instance 0002 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 MPI Instance 0003 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 MPI Instance 0004 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but uses all hwthreads I have tried several combinations of --use-hwthread-cpus, --bind-to hwthreads, but didn't find the right combination. Would be great to get any hints? Thank a lot in advance, Heinz-Ado Arnolds _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users