Dear Gilles, thanks a lot for your response!
1. You're right, my stupid error, I forgot the "export" of OMP_PROC_BIND in my job script. Now this example is working nearly as expected: [pascal-1-07:25617] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] [pascal-1-07:25617] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] [pascal-0-06:02774] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] [pascal-0-06:02774] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] MPI Instance 0001 of 0004 is on pascal-1-07, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0001(pid 25634), cpu# 000, 0x00000001, Cpus_allowed_list: 0 MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0002(pid 25634), cpu# 002, 0x00000004, Cpus_allowed_list: 2 MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0003(pid 25634), cpu# 004, 0x00000010, Cpus_allowed_list: 4 MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0004(pid 25634), cpu# 006, 0x00000040, Cpus_allowed_list: 6 MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0005(pid 25634), cpu# 008, 0x00000100, Cpus_allowed_list: 8 MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0006(pid 25634), cpu# 010, 0x00000400, Cpus_allowed_list: 10 MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0007(pid 25634), cpu# 012, 0x00001000, Cpus_allowed_list: 12 MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0008(pid 25634), cpu# 014, 0x00004000, Cpus_allowed_list: 14 MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0009(pid 25634), cpu# 016, 0x00010000, Cpus_allowed_list: 16 MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0010(pid 25634), cpu# 018, 0x00040000, Cpus_allowed_list: 18 MPI Instance 0002 of 0004 is on pascal-1-07, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0001(pid 25633), cpu# 001, 0x00000002, Cpus_allowed_list: 1 MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0002(pid 25633), cpu# 003, 0x00000008, Cpus_allowed_list: 3 MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0003(pid 25633), cpu# 005, 0x00000020, Cpus_allowed_list: 5 MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0004(pid 25633), cpu# 007, 0x00000080, Cpus_allowed_list: 7 MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0005(pid 25633), cpu# 009, 0x00000200, Cpus_allowed_list: 9 MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0006(pid 25633), cpu# 011, 0x00000800, Cpus_allowed_list: 11 MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0007(pid 25633), cpu# 013, 0x00002000, Cpus_allowed_list: 13 MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0008(pid 25633), cpu# 015, 0x00008000, Cpus_allowed_list: 15 MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0009(pid 25633), cpu# 017, 0x00020000, Cpus_allowed_list: 17 MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0010(pid 25633), cpu# 019, 0x00080000, Cpus_allowed_list: 19 MPI Instance 0003 of 0004 is on pascal-0-06, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0001(pid 02787), cpu# 000, 0x00000001, Cpus_allowed_list: 0 MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0002(pid 02787), cpu# 002, 0x00000004, Cpus_allowed_list: 2 MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0003(pid 02787), cpu# 004, 0x00000010, Cpus_allowed_list: 4 MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0004(pid 02787), cpu# 006, 0x00000040, Cpus_allowed_list: 6 MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0005(pid 02787), cpu# 008, 0x00000100, Cpus_allowed_list: 8 MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0006(pid 02787), cpu# 010, 0x00000400, Cpus_allowed_list: 10 MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0007(pid 02787), cpu# 012, 0x00001000, Cpus_allowed_list: 12 MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0008(pid 02787), cpu# 014, 0x00004000, Cpus_allowed_list: 14 MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0009(pid 02787), cpu# 016, 0x00010000, Cpus_allowed_list: 16 MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0010(pid 02787), cpu# 018, 0x00040000, Cpus_allowed_list: 18 MPI Instance 0004 of 0004 is on pascal-0-06, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0001(pid 02786), cpu# 001, 0x00000002, Cpus_allowed_list: 1 MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0002(pid 02786), cpu# 003, 0x00000008, Cpus_allowed_list: 3 MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0003(pid 02786), cpu# 005, 0x00000020, Cpus_allowed_list: 5 MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0004(pid 02786), cpu# 007, 0x00000080, Cpus_allowed_list: 7 MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0005(pid 02786), cpu# 009, 0x00000200, Cpus_allowed_list: 9 MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0006(pid 02786), cpu# 011, 0x00000800, Cpus_allowed_list: 11 MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0007(pid 02786), cpu# 013, 0x00002000, Cpus_allowed_list: 13 MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0008(pid 02786), cpu# 015, 0x00008000, Cpus_allowed_list: 15 MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0009(pid 02786), cpu# 017, 0x00020000, Cpus_allowed_list: 17 MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0010(pid 02786), cpu# 019, 0x00080000, Cpus_allowed_list: 19 Only remaining question: why is "Cpus_allowed_list" of the OpenMPI job still listing the full range of all cores/hwthreads, but OpenMP jobs only use numbers 0-19 (as expected)? 2. I have a different scenario which doesn't still work as expected: Now i like to have 8 OpenMPI jobs for 2 nodes -> 4 OpenMPI jobs per node -> 2 per socket, each executing one OpenMP job with 5 threads mpirun -np 8 --map-by ppr:2:socket --use-hwthread-cpus -report-bindings --mca plm_rsh_agent "qrsh" ./myid I'd like to have a binding like this cores node 0 socket 0: 0+2+4+6+8 10+12+14+16+18 socket 1: 1+3+5+7+9 11+13+15+17+19 node 1 socket 0: 0+2+4+6+8 10+12+14+16+18 socket 1: 1+3+5+7+9 11+13+15+17+19 but as you find below all jobs are bound to all cores again which leads to a situation like cores node 0 socket 0: 0+2+4+6+8 0+2+4+6+8 socket 1: 1+3+5+7+9 1+3+5+7+9 node 1 socket 0: 0+2+4+6+8 0+2+4+6+8 socket 1: 1+3+5+7+9 1+3+5+7+9 Could you give me a hint again how I could imporve that? [pascal-0-01:01972] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] [pascal-0-01:01972] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] [pascal-0-01:01972] MCW rank 2 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] [pascal-0-01:01972] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] [pascal-2-01:18506] MCW rank 4 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] [pascal-2-01:18506] MCW rank 5 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] [pascal-2-01:18506] MCW rank 6 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] [pascal-2-01:18506] MCW rank 7 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] MPI Instance 0001 of 0008 is on pascal-0-01, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0001(pid 01999), cpu# 000, 0x00000001, Cpus_allowed_list: 0 MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0002(pid 01999), cpu# 002, 0x00000004, Cpus_allowed_list: 2 MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0003(pid 01999), cpu# 004, 0x00000010, Cpus_allowed_list: 4 MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0004(pid 01999), cpu# 006, 0x00000040, Cpus_allowed_list: 6 MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0005(pid 01999), cpu# 008, 0x00000100, Cpus_allowed_list: 8 MPI Instance 0002 of 0008 is on pascal-0-01, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0001(pid 01996), cpu# 000, 0x00000001, Cpus_allowed_list: 0 MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0002(pid 01996), cpu# 002, 0x00000004, Cpus_allowed_list: 2 MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0003(pid 01996), cpu# 004, 0x00000010, Cpus_allowed_list: 4 MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0004(pid 01996), cpu# 006, 0x00000040, Cpus_allowed_list: 6 MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0005(pid 01996), cpu# 008, 0x00000100, Cpus_allowed_list: 8 MPI Instance 0003 of 0008 is on pascal-0-01, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0001(pid 01998), cpu# 001, 0x00000002, Cpus_allowed_list: 1 MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0002(pid 01998), cpu# 003, 0x00000008, Cpus_allowed_list: 3 MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0003(pid 01998), cpu# 005, 0x00000020, Cpus_allowed_list: 5 MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0004(pid 01998), cpu# 007, 0x00000080, Cpus_allowed_list: 7 MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0005(pid 01998), cpu# 009, 0x00000200, Cpus_allowed_list: 9 MPI Instance 0004 of 0008 is on pascal-0-01, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0001(pid 01997), cpu# 001, 0x00000002, Cpus_allowed_list: 1 MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0002(pid 01997), cpu# 003, 0x00000008, Cpus_allowed_list: 3 MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0003(pid 01997), cpu# 005, 0x00000020, Cpus_allowed_list: 5 MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0004(pid 01997), cpu# 007, 0x00000080, Cpus_allowed_list: 7 MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0005(pid 01997), cpu# 009, 0x00000200, Cpus_allowed_list: 9 MPI Instance 0005 of 0008 is on pascal-2-01, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0001(pid 18531), cpu# 000, 0x00000001, Cpus_allowed_list: 0 MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0002(pid 18531), cpu# 002, 0x00000004, Cpus_allowed_list: 2 MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0003(pid 18531), cpu# 004, 0x00000010, Cpus_allowed_list: 4 MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0004(pid 18531), cpu# 006, 0x00000040, Cpus_allowed_list: 6 MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0005(pid 18531), cpu# 008, 0x00000100, Cpus_allowed_list: 8 MPI Instance 0006 of 0008 is on pascal-2-01, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0001(pid 18530), cpu# 000, 0x00000001, Cpus_allowed_list: 0 MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0002(pid 18530), cpu# 002, 0x00000004, Cpus_allowed_list: 2 MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0003(pid 18530), cpu# 004, 0x00000010, Cpus_allowed_list: 4 MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0004(pid 18530), cpu# 006, 0x00000040, Cpus_allowed_list: 6 MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0005(pid 18530), cpu# 008, 0x00000100, Cpus_allowed_list: 8 MPI Instance 0007 of 0008 is on pascal-2-01, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0001(pid 18528), cpu# 001, 0x00000002, Cpus_allowed_list: 1 MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0002(pid 18528), cpu# 003, 0x00000008, Cpus_allowed_list: 3 MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0003(pid 18528), cpu# 005, 0x00000020, Cpus_allowed_list: 5 MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0004(pid 18528), cpu# 007, 0x00000080, Cpus_allowed_list: 7 MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0005(pid 18528), cpu# 009, 0x00000200, Cpus_allowed_list: 9 MPI Instance 0008 of 0008 is on pascal-2-01, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0001(pid 18527), cpu# 001, 0x00000002, Cpus_allowed_list: 1 MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0002(pid 18527), cpu# 003, 0x00000008, Cpus_allowed_list: 3 MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0003(pid 18527), cpu# 005, 0x00000020, Cpus_allowed_list: 5 MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0004(pid 18527), cpu# 007, 0x00000080, Cpus_allowed_list: 7 MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0005(pid 18527), cpu# 009, 0x00000200, Cpus_allowed_list: 9 Thanks a lot in advance for your advice any have nice Easter days! Ado On 13.04.2017 08:48, Gilles Gouaillardet wrote: > Heinz-Ado, > > > it seems the OpenMP runtime did *not* bind the OMP threads at all as > requested, > > and the root cause could be the OMP_PROC_BIND environment variable was not > propagated > > can you try > > mpirun -x OMP_PROC_BIND ... > > and see if it helps ? > > > Cheers, > > > On 4/13/2017 12:23 AM, Heinz-Ado Arnolds wrote: >> Dear Gilles, >> >> thanks for your answer. >> >> - compiler: gcc-6.3.0 >> - OpenMP environment vars: OMP_PROC_BIND=true, GOMP_CPU_AFFINITY not set >> - hyperthread a given OpenMP thread is on: it's printed in the output below >> as a 3-digit number after the first ",", read by sched_getcpu() in the >> OpenMP test code >> - the migration between cores/hyperthreads should be prevented by >> OMP_PROC_BIND=true >> - I didn't find a migration, but the similar use of one core/hyperthread by >> two OpenMP threads in example "4"/"MPI Instance 0002": 011/031 are both on >> core #11. >> >> Are there any hints how to cleanly transfer the OpenMPI binding to the >> OpenMP tasks? >> >> Thanks and kind regards, >> >> Ado >> >> On 12.04.2017 15:40, Gilles Gouaillardet wrote: >>> That should be a two steps tango >>> - Open MPI bind a MPI task to a socket >>> - the OpenMP runtime bind OpenMP threads to cores (or hyper threads) inside >>> the socket assigned by Open MPI >>> >>> which compiler are you using ? >>> do you set some environment variables to direct OpenMP to bind threads ? >>> >>> Also, how do you measure the hyperthread a given OpenMP thread is on ? >>> is it the hyperthread used at a given time ? If yes, then the thread might >>> migrate unless it was pinned by the OpenMP runtime. >>> >>> If you are not sure, please post the source of your program so we can have >>> a look >>> >>> Last but not least, as long as OpenMP threads are pinned to distinct cores, >>> you should not worry about them migrating between hyperthreads from the >>> same core. >>> >>> Cheers, >>> >>> Gilles >>> >>> On Wednesday, April 12, 2017, Heinz-Ado Arnolds >>> <arno...@mpa-garching.mpg.de <mailto:arno...@mpa-garching.mpg.de>> wrote: >>> >>> Dear rhc, >>> >>> to make it more clear what I try to achieve, I collected some examples >>> for several combinations of command line options. Would be great if you >>> find time to look to these below. The most promise one is example "4". >>> >>> I'd like to have 4 MPI jobs starting 1 OpenMP job each with 10 threads, >>> running on 2 nodes, each having 2 sockets, with 10 cores & 10 hwthreads. >>> Only 10 cores (no hwthreads) should be used on each socket. >>> >>> 4 MPI -> 1 OpenMP with 10 thread (i.e. 4x10 threads) >>> 2 nodes, 2 sockets each, 10 cores & 10 hwthreads each >>> >>> 1. mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" >>> -report-bindings ./myid >>> >>> Machines : >>> pascal-2-05...DE 20 >>> pascal-1-03...DE 20 >>> >>> [pascal-2-05:28817] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket >>> 0[core 9[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] >>> [pascal-2-05:28817] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], >>> socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core >>> 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], >>> socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core >>> 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: >>> [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] >>> [pascal-1-03:19256] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket >>> 0[core 9[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] >>> [pascal-1-03:19256] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], >>> socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core >>> 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], >>> socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core >>> 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: >>> [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] >>> MPI Instance 0001 of 0004 is on pascal-2-05, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0001(pid >>> 28833), 018, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0002(pid >>> 28833), 014, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0003(pid >>> 28833), 028, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0004(pid >>> 28833), 012, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0005(pid >>> 28833), 030, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0006(pid >>> 28833), 016, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0007(pid >>> 28833), 038, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0008(pid >>> 28833), 034, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0009(pid >>> 28833), 020, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0010(pid >>> 28833), 022, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0002 of 0004 is on pascal-2-05, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0001(pid >>> 28834), 007, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0002(pid >>> 28834), 037, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0003(pid >>> 28834), 039, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0004(pid >>> 28834), 035, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0005(pid >>> 28834), 031, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0006(pid >>> 28834), 005, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0007(pid >>> 28834), 027, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0008(pid >>> 28834), 017, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0009(pid >>> 28834), 019, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0010(pid >>> 28834), 029, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid >>> 19269), 012, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid >>> 19269), 034, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid >>> 19269), 008, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid >>> 19269), 038, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid >>> 19269), 032, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid >>> 19269), 036, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid >>> 19269), 020, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid >>> 19269), 002, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid >>> 19269), 004, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid >>> 19269), 006, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid >>> 19268), 005, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid >>> 19268), 029, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid >>> 19268), 015, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid >>> 19268), 007, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid >>> 19268), 031, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid >>> 19268), 013, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid >>> 19268), 037, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid >>> 19268), 039, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid >>> 19268), 021, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid >>> 19268), 023, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> >>> I get a distribution to 4 sockets on 2 nodes as expected, but cores >>> and corresponding hwthreads are used simultaneously: >>> MPI Instance 0001 of 0004: MP thread #0001 runs on CPU 018, MP >>> thread #0007 runs on CPU 038, >>> MP thread #0002 runs on CPU 014, MP >>> thread #0008 runs on CPU 034 >>> according to "lscpu -a -e" CPUs 18/38 resp. 14/34 are the same >>> physical cores >>> >>> 2. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus -bind-to >>> hwthread --mca plm_rsh_agent "qrsh" -report-bindings ./myid >>> >>> Machines : >>> pascal-1-05...DE 20 >>> pascal-2-05...DE 20 >>> >>> I get this warning: >>> >>> WARNING: a request was made to bind a process. While the system >>> supports binding the process itself, at least one node does NOT >>> support binding memory to the process location. >>> >>> Node: pascal-1-05 >>> >>> Open MPI uses the "hwloc" library to perform process and memory >>> binding. This error message means that hwloc has indicated that >>> processor binding support is not available on this machine. >>> >>> On OS X, processor and memory binding is not available at all >>> (i.e., >>> the OS does not expose this functionality). >>> >>> On Linux, lack of the functionality can mean that you are on a >>> platform where processor and memory affinity is not supported in >>> Linux >>> itself, or that hwloc was built without NUMA and/or processor >>> affinity >>> support. When building hwloc (which, depending on your Open MPI >>> installation, may be embedded in Open MPI itself), it is important >>> to >>> have the libnuma header and library files available. Different >>> linux >>> distributions package these files under different names; look for >>> packages with the word "numa" in them. You may also need a >>> developer >>> version of the package (e.g., with "dev" or "devel" in the name) to >>> obtain the relevant header files. >>> >>> If you are getting this message on a non-OS X, non-Linux platform, >>> then hwloc does not support processor / memory affinity on this >>> platform. If the OS/platform does actually support processor / >>> memory >>> affinity, then you should contact the hwloc maintainers: >>> https://github.com/open-mpi/hwloc >>> <https://github.com/open-mpi/hwloc>. >>> >>> This is a warning only; your job will continue, though performance >>> may >>> be degraded. >>> >>> and these results: >>> >>> [pascal-1-05:33175] MCW rank 0 bound to socket 0[core 0[hwt 0]]: >>> [B./../../../../../../../../..][../../../../../../../../../..] >>> [pascal-1-05:33175] MCW rank 1 bound to socket 0[core 0[hwt 1]]: >>> [.B/../../../../../../../../..][../../../../../../../../../..] >>> [pascal-2-05:28916] MCW rank 2 bound to socket 0[core 0[hwt 0]]: >>> [B./../../../../../../../../..][../../../../../../../../../..] >>> [pascal-2-05:28916] MCW rank 3 bound to socket 0[core 0[hwt 1]]: >>> [.B/../../../../../../../../..][../../../../../../../../../..] >>> MPI Instance 0001 of 0004 is on pascal-1-05, Cpus_allowed_list: >>> 0 >>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0001(pid >>> 33193), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0002(pid >>> 33193), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0003(pid >>> 33193), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0004(pid >>> 33193), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0005(pid >>> 33193), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0006(pid >>> 33193), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0007(pid >>> 33193), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0008(pid >>> 33193), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0009(pid >>> 33193), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0010(pid >>> 33193), 000, Cpus_allowed_list: 0 >>> MPI Instance 0002 of 0004 is on pascal-1-05, Cpus_allowed_list: >>> 20 >>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0001(pid >>> 33192), 020, Cpus_allowed_list: 20 >>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0002(pid >>> 33192), 020, Cpus_allowed_list: 20 >>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0003(pid >>> 33192), 020, Cpus_allowed_list: 20 >>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0004(pid >>> 33192), 020, Cpus_allowed_list: 20 >>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0005(pid >>> 33192), 020, Cpus_allowed_list: 20 >>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0006(pid >>> 33192), 020, Cpus_allowed_list: 20 >>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0007(pid >>> 33192), 020, Cpus_allowed_list: 20 >>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0008(pid >>> 33192), 020, Cpus_allowed_list: 20 >>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0009(pid >>> 33192), 020, Cpus_allowed_list: 20 >>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0010(pid >>> 33192), 020, Cpus_allowed_list: 20 >>> MPI Instance 0003 of 0004 is on pascal-2-05, Cpus_allowed_list: >>> 0 >>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0001(pid >>> 28930), 000, Cpus_allowed_list: 0 >>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0002(pid >>> 28930), 000, Cpus_allowed_list: 0 >>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0003(pid >>> 28930), 000, Cpus_allowed_list: 0 >>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0004(pid >>> 28930), 000, Cpus_allowed_list: 0 >>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0005(pid >>> 28930), 000, Cpus_allowed_list: 0 >>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0006(pid >>> 28930), 000, Cpus_allowed_list: 0 >>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0007(pid >>> 28930), 000, Cpus_allowed_list: 0 >>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0008(pid >>> 28930), 000, Cpus_allowed_list: 0 >>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0009(pid >>> 28930), 000, Cpus_allowed_list: 0 >>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0010(pid >>> 28930), 000, Cpus_allowed_list: 0 >>> MPI Instance 0004 of 0004 is on pascal-2-05, Cpus_allowed_list: >>> 20 >>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0001(pid >>> 28929), 020, Cpus_allowed_list: 20 >>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0002(pid >>> 28929), 020, Cpus_allowed_list: 20 >>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0003(pid >>> 28929), 020, Cpus_allowed_list: 20 >>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0004(pid >>> 28929), 020, Cpus_allowed_list: 20 >>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0005(pid >>> 28929), 020, Cpus_allowed_list: 20 >>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0006(pid >>> 28929), 020, Cpus_allowed_list: 20 >>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0007(pid >>> 28929), 020, Cpus_allowed_list: 20 >>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0008(pid >>> 28929), 020, Cpus_allowed_list: 20 >>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0009(pid >>> 28929), 020, Cpus_allowed_list: 20 >>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0010(pid >>> 28929), 020, Cpus_allowed_list: 20 >>> >>> Only 2 CPUs are used and these are the same physical cores. >>> >>> 3. mpirun -np 4 --use-hwthread-cpus -bind-to hwthread --mca >>> plm_rsh_agent "qrsh" -report-bindings ./myid >>> >>> Machines : >>> pascal-1-03...DE 20 >>> pascal-2-02...DE 20 >>> >>> I get a warning again: >>> >>> WARNING: a request was made to bind a process. While the system >>> supports binding the process itself, at least one node does NOT >>> support binding memory to the process location. >>> >>> Node: pascal-1-03 >>> >>> Open MPI uses the "hwloc" library to perform process and memory >>> binding. This error message means that hwloc has indicated that >>> processor binding support is not available on this machine. >>> >>> On OS X, processor and memory binding is not available at all >>> (i.e., >>> the OS does not expose this functionality). >>> >>> On Linux, lack of the functionality can mean that you are on a >>> platform where processor and memory affinity is not supported in >>> Linux >>> itself, or that hwloc was built without NUMA and/or processor >>> affinity >>> support. When building hwloc (which, depending on your Open MPI >>> installation, may be embedded in Open MPI itself), it is important >>> to >>> have the libnuma header and library files available. Different >>> linux >>> distributions package these files under different names; look for >>> packages with the word "numa" in them. You may also need a >>> developer >>> version of the package (e.g., with "dev" or "devel" in the name) to >>> obtain the relevant header files. >>> >>> If you are getting this message on a non-OS X, non-Linux platform, >>> then hwloc does not support processor / memory affinity on this >>> platform. If the OS/platform does actually support processor / >>> memory >>> affinity, then you should contact the hwloc maintainers: >>> https://github.com/open-mpi/hwloc >>> <https://github.com/open-mpi/hwloc>. >>> >>> This is a warning only; your job will continue, though performance >>> may >>> be degraded. >>> >>> and these results: >>> >>> [pascal-1-03:19345] MCW rank 0 bound to socket 0[core 0[hwt 0]]: >>> [B./../../../../../../../../..][../../../../../../../../../..] >>> [pascal-1-03:19345] MCW rank 1 bound to socket 1[core 10[hwt 0]]: >>> [../../../../../../../../../..][B./../../../../../../../../..] >>> [pascal-1-03:19345] MCW rank 2 bound to socket 0[core 0[hwt 1]]: >>> [.B/../../../../../../../../..][../../../../../../../../../..] >>> [pascal-1-03:19345] MCW rank 3 bound to socket 1[core 10[hwt 1]]: >>> [../../../../../../../../../..][.B/../../../../../../../../..] >>> MPI Instance 0001 of 0004 is on pascal-1-03, Cpus_allowed_list: >>> 0 >>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0001(pid >>> 19373), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0002(pid >>> 19373), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0003(pid >>> 19373), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0004(pid >>> 19373), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0005(pid >>> 19373), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0006(pid >>> 19373), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0007(pid >>> 19373), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0008(pid >>> 19373), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0009(pid >>> 19373), 000, Cpus_allowed_list: 0 >>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0010(pid >>> 19373), 000, Cpus_allowed_list: 0 >>> MPI Instance 0002 of 0004 is on pascal-1-03, Cpus_allowed_list: >>> 1 >>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0001(pid >>> 19372), 001, Cpus_allowed_list: 1 >>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0002(pid >>> 19372), 001, Cpus_allowed_list: 1 >>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0003(pid >>> 19372), 001, Cpus_allowed_list: 1 >>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0004(pid >>> 19372), 001, Cpus_allowed_list: 1 >>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0005(pid >>> 19372), 001, Cpus_allowed_list: 1 >>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0006(pid >>> 19372), 001, Cpus_allowed_list: 1 >>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0007(pid >>> 19372), 001, Cpus_allowed_list: 1 >>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0008(pid >>> 19372), 001, Cpus_allowed_list: 1 >>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0009(pid >>> 19372), 001, Cpus_allowed_list: 1 >>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0010(pid >>> 19372), 001, Cpus_allowed_list: 1 >>> MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: >>> 20 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid >>> 19370), 020, Cpus_allowed_list: 20 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid >>> 19370), 020, Cpus_allowed_list: 20 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid >>> 19370), 020, Cpus_allowed_list: 20 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid >>> 19370), 020, Cpus_allowed_list: 20 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid >>> 19370), 020, Cpus_allowed_list: 20 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid >>> 19370), 020, Cpus_allowed_list: 20 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid >>> 19370), 020, Cpus_allowed_list: 20 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid >>> 19370), 020, Cpus_allowed_list: 20 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid >>> 19370), 020, Cpus_allowed_list: 20 >>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid >>> 19370), 020, Cpus_allowed_list: 20 >>> MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: >>> 21 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid >>> 19371), 021, Cpus_allowed_list: 21 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid >>> 19371), 021, Cpus_allowed_list: 21 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid >>> 19371), 021, Cpus_allowed_list: 21 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid >>> 19371), 021, Cpus_allowed_list: 21 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid >>> 19371), 021, Cpus_allowed_list: 21 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid >>> 19371), 021, Cpus_allowed_list: 21 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid >>> 19371), 021, Cpus_allowed_list: 21 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid >>> 19371), 021, Cpus_allowed_list: 21 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid >>> 19371), 021, Cpus_allowed_list: 21 >>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid >>> 19371), 021, Cpus_allowed_list: 21 >>> >>> The jobs are scheduled to one machine only. >>> >>> 4. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus --mca >>> plm_rsh_agent "qrsh" -report-bindings ./myid >>> >>> Machines : >>> pascal-1-00...DE 20 >>> pascal-3-00...DE 20 >>> >>> [pascal-1-00:05867] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket >>> 0[core 9[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] >>> [pascal-1-00:05867] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], >>> socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core >>> 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], >>> socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core >>> 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: >>> [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] >>> [pascal-3-00:07501] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket >>> 0[core 9[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] >>> [pascal-3-00:07501] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], >>> socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core >>> 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], >>> socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core >>> 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: >>> [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] >>> MPI Instance 0001 of 0004 is on pascal-1-00, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0001(pid >>> 05884), 034, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0002(pid >>> 05884), 038, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0003(pid >>> 05884), 002, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0004(pid >>> 05884), 008, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0005(pid >>> 05884), 036, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0006(pid >>> 05884), 000, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0007(pid >>> 05884), 004, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0008(pid >>> 05884), 006, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0009(pid >>> 05884), 030, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0010(pid >>> 05884), 032, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0002 of 0004 is on pascal-1-00, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0001(pid >>> 05883), 031, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0002(pid >>> 05883), 017, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0003(pid >>> 05883), 027, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0004(pid >>> 05883), 039, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0005(pid >>> 05883), 011, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0006(pid >>> 05883), 033, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0007(pid >>> 05883), 015, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0008(pid >>> 05883), 021, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0009(pid >>> 05883), 003, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0010(pid >>> 05883), 025, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0003 of 0004 is on pascal-3-00, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0001(pid >>> 07513), 016, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0002(pid >>> 07513), 020, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0003(pid >>> 07513), 022, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0004(pid >>> 07513), 018, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0005(pid >>> 07513), 012, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0006(pid >>> 07513), 004, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0007(pid >>> 07513), 008, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0008(pid >>> 07513), 006, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0009(pid >>> 07513), 030, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0010(pid >>> 07513), 034, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> MPI Instance 0004 of 0004 is on pascal-3-00, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0001(pid >>> 07514), 017, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0002(pid >>> 07514), 025, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0003(pid >>> 07514), 029, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0004(pid >>> 07514), 003, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0005(pid >>> 07514), 033, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0006(pid >>> 07514), 001, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0007(pid >>> 07514), 007, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0008(pid >>> 07514), 039, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0009(pid >>> 07514), 035, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0010(pid >>> 07514), 031, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> >>> This distribution looks very well with this combination of options >>> "--map-by ppr:2:node --use-hwthread-cpus", with one exception: looking at >>> "MPI Instance 0002", you'll find that "MP thread #0001" is executed on CPU >>> 031, and "MP thread #0005" is executed on CPU 011. 011/031 are the same >>> physical core. >>> All others are real perfect! Is this error due to my fault or might >>> their be a small remaining binding problem in OpenMPI? >>> >>> I'd appreciate any hint very much! >>> >>> Kind regards, >>> >>> Ado >>> >>> On 11.04.2017 01:36, r...@open-mpi.org <javascript:;> wrote: >>> > I’m not entirely sure I understand your reference to “real cores”. >>> When we bind you to a core, we bind you to all the HT’s that comprise that >>> core. So, yes, with HT enabled, the binding report will list things by HT, >>> but you’ll always be bound to the full core if you tell us bind-to core >>> > >>> > The default binding directive is bind-to socket when more than 2 >>> processes are in the job, and that’s what you are showing. You can override >>> that by adding "-bind-to core" to your cmd line if that is what you desire. >>> > >>> > If you want to use individual HTs as independent processors, then >>> “--use-hwthread-cpus -bind-to hwthreads” would indeed be the right >>> combination. >>> > >>> >> On Apr 10, 2017, at 3:55 AM, Heinz-Ado Arnolds >>> <arno...@mpa-garching.mpg.de <javascript:;>> wrote: >>> >> >>> >> Dear OpenMPI users & developers, >>> >> >>> >> I'm trying to distribute my jobs (with SGE) to a machine with a >>> certain number of nodes, each node having 2 sockets, each socket having 10 >>> cores & 10 hyperthreads. I like to use only the real cores, no >>> hyperthreading. >>> >> >>> >> lscpu -a -e >>> >> >>> >> CPU NODE SOCKET CORE L1d:L1i:L2:L3 >>> >> 0 0 0 0 0:0:0:0 >>> >> 1 1 1 1 1:1:1:1 >>> >> 2 0 0 2 2:2:2:0 >>> >> 3 1 1 3 3:3:3:1 >>> >> 4 0 0 4 4:4:4:0 >>> >> 5 1 1 5 5:5:5:1 >>> >> 6 0 0 6 6:6:6:0 >>> >> 7 1 1 7 7:7:7:1 >>> >> 8 0 0 8 8:8:8:0 >>> >> 9 1 1 9 9:9:9:1 >>> >> 10 0 0 10 10:10:10:0 >>> >> 11 1 1 11 11:11:11:1 >>> >> 12 0 0 12 12:12:12:0 >>> >> 13 1 1 13 13:13:13:1 >>> >> 14 0 0 14 14:14:14:0 >>> >> 15 1 1 15 15:15:15:1 >>> >> 16 0 0 16 16:16:16:0 >>> >> 17 1 1 17 17:17:17:1 >>> >> 18 0 0 18 18:18:18:0 >>> >> 19 1 1 19 19:19:19:1 >>> >> 20 0 0 0 0:0:0:0 >>> >> 21 1 1 1 1:1:1:1 >>> >> 22 0 0 2 2:2:2:0 >>> >> 23 1 1 3 3:3:3:1 >>> >> 24 0 0 4 4:4:4:0 >>> >> 25 1 1 5 5:5:5:1 >>> >> 26 0 0 6 6:6:6:0 >>> >> 27 1 1 7 7:7:7:1 >>> >> 28 0 0 8 8:8:8:0 >>> >> 29 1 1 9 9:9:9:1 >>> >> 30 0 0 10 10:10:10:0 >>> >> 31 1 1 11 11:11:11:1 >>> >> 32 0 0 12 12:12:12:0 >>> >> 33 1 1 13 13:13:13:1 >>> >> 34 0 0 14 14:14:14:0 >>> >> 35 1 1 15 15:15:15:1 >>> >> 36 0 0 16 16:16:16:0 >>> >> 37 1 1 17 17:17:17:1 >>> >> 38 0 0 18 18:18:18:0 >>> >> 39 1 1 19 19:19:19:1 >>> >> >>> >> How do I have to choose the options & parameters of mpirun to >>> achieve this behavior? >>> >> >>> >> mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" >>> -report-bindings ./myid >>> >> >>> >> distributes to >>> >> >>> >> [pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket >>> 0[core 9[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] >>> >> [pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], >>> socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core >>> 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], >>> socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core >>> 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: >>> [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] >>> >> [pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket >>> 0[core 9[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..] >>> >> [pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], >>> socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core >>> 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], >>> socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core >>> 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: >>> [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB] >>> >> MPI Instance 0001 of 0004 is on >>> pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE >>> <http://pascal-1-04.MPA-Garching.MPG.DE>, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> >> MPI Instance 0002 of 0004 is on >>> pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE >>> <http://pascal-1-04.MPA-Garching.MPG.DE>, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> >> MPI Instance 0003 of 0004 is on >>> pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE >>> <http://pascal-1-03.MPA-Garching.MPG.DE>, Cpus_allowed_list: >>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> >> MPI Instance 0004 of 0004 is on >>> pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE >>> <http://pascal-1-03.MPA-Garching.MPG.DE>, Cpus_allowed_list: >>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> >> >>> >> i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but >>> uses all hwthreads >>> >> >>> >> I have tried several combinations of --use-hwthread-cpus, --bind-to >>> hwthreads, but didn't find the right combination. >>> >> >>> >> Would be great to get any hints? >>> >> >>> >> Thank a lot in advance, >>> >> >>> >> Heinz-Ado Arnolds >>> >> _______________________________________________ >>> >> users mailing list >>> >> users@lists.open-mpi.org <javascript:;> >>> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> > >>> > _______________________________________________ >>> > users mailing list >>> > users@lists.open-mpi.org <javascript:;> >>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> > >>> > >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users