https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80822

            Bug ID: 80822
           Summary: libgomp incorrect affinity when OMP_PLACES=threads
           Product: gcc
           Version: 6.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: weeks at iastate dot edu
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

Created attachment 41385
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41385&action=edit
xthi.c from Cray, Inc. modified to remove MPI code

On the NERSC Cori system, the Haswell nodes have two Intel Xeon E5-2698 v3
processors, each with 16 CPU cores with HyperThreading enabled. Using
OMP_PLACES=threads, libgomp from gcc 6.3.0 appears to mistakenly assume that
CPU (hardware thread) 0 and 1 share the same core, while in reality 0 and 32
are on the same core, etc.

To illustrate, attached (xthi-omp.c) is a version of xthi.c from the "Cray XC
Series User Application Placement Guide (CLE 6.0.UP01) S-2496"
(https://pubs.cray.com/content/00330629-DC/FA00256413) that has been modified
to remove the MPI code. The output of en MPI 1.10.2 "lstopo --of console"
command (lstopo.out) that shows the processor topology is at the bottom of this
text.

In the first example (OMP_NUM_THREADS=32 OMP_PLACES=threads
OMP_PROC_BIND=spread), CPU cores 0, 2, 4, ..., 30 each have two OpenMP threads,
while CPU cores 1,3,...,31 have none:

======================================================================
$ cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0,32
$ gcc --version
gcc (GCC) 6.3.0 20161221 (Cray Inc.)
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -fopenmp -o xthi-omp.x xthi-omp.c
$ OMP_NUM_THREADS=32 OMP_PLACES=threads OMP_PROC_BIND=spread ./xthi-omp.x |
sort -k 4n,4n
Hello from thread 0, on nid00009. (core affinity = 0)
Hello from thread 1, on nid00009. (core affinity = 2)
Hello from thread 2, on nid00009. (core affinity = 4)
Hello from thread 3, on nid00009. (core affinity = 6)
Hello from thread 4, on nid00009. (core affinity = 8)
Hello from thread 5, on nid00009. (core affinity = 10) 
Hello from thread 6, on nid00009. (core affinity = 12) 
Hello from thread 7, on nid00009. (core affinity = 14) 
Hello from thread 8, on nid00009. (core affinity = 16) 
Hello from thread 9, on nid00009. (core affinity = 18) 
Hello from thread 10, on nid00009. (core affinity = 20) 
Hello from thread 11, on nid00009. (core affinity = 22) 
Hello from thread 12, on nid00009. (core affinity = 24) 
Hello from thread 13, on nid00009. (core affinity = 26) 
Hello from thread 14, on nid00009. (core affinity = 28) 
Hello from thread 15, on nid00009. (core affinity = 30) 
Hello from thread 16, on nid00009. (core affinity = 32) 
Hello from thread 17, on nid00009. (core affinity = 34) 
Hello from thread 18, on nid00009. (core affinity = 36) 
Hello from thread 19, on nid00009. (core affinity = 38) 
Hello from thread 20, on nid00009. (core affinity = 40) 
Hello from thread 21, on nid00009. (core affinity = 42) 
Hello from thread 22, on nid00009. (core affinity = 44) 
Hello from thread 23, on nid00009. (core affinity = 46) 
Hello from thread 24, on nid00009. (core affinity = 48) 
Hello from thread 25, on nid00009. (core affinity = 50) 
Hello from thread 26, on nid00009. (core affinity = 52) 
Hello from thread 27, on nid00009. (core affinity = 54) 
Hello from thread 28, on nid00009. (core affinity = 56) 
Hello from thread 29, on nid00009. (core affinity = 58) 
Hello from thread 30, on nid00009. (core affinity = 60) 
Hello from thread 31, on nid00009. (core affinity = 62) 
======================================================================

In the second example, OMP_PROC_BIND=close results in 1 OpenMP thread per core,
opposite of the intended effect:

======================================================================
$ OMP_NUM_THREADS=32 OMP_PLACES=threads OMP_PROC_BIND=close ./xthi-omp.x | sort
-k 4n,4n
Hello from thread 0, on nid00009. (core affinity = 0)
Hello from thread 1, on nid00009. (core affinity = 1)
Hello from thread 2, on nid00009. (core affinity = 2)
Hello from thread 3, on nid00009. (core affinity = 3)
Hello from thread 4, on nid00009. (core affinity = 4)
Hello from thread 5, on nid00009. (core affinity = 5)
Hello from thread 6, on nid00009. (core affinity = 6)
Hello from thread 7, on nid00009. (core affinity = 7)
Hello from thread 8, on nid00009. (core affinity = 8)
Hello from thread 9, on nid00009. (core affinity = 9)
Hello from thread 10, on nid00009. (core affinity = 10)
Hello from thread 11, on nid00009. (core affinity = 11)
Hello from thread 12, on nid00009. (core affinity = 12)
Hello from thread 13, on nid00009. (core affinity = 13)
Hello from thread 14, on nid00009. (core affinity = 14)
Hello from thread 15, on nid00009. (core affinity = 15)
Hello from thread 16, on nid00009. (core affinity = 16)
Hello from thread 17, on nid00009. (core affinity = 17)
Hello from thread 18, on nid00009. (core affinity = 18)
Hello from thread 19, on nid00009. (core affinity = 19)
Hello from thread 20, on nid00009. (core affinity = 20)
Hello from thread 21, on nid00009. (core affinity = 21)
Hello from thread 22, on nid00009. (core affinity = 22)
Hello from thread 23, on nid00009. (core affinity = 23)
Hello from thread 24, on nid00009. (core affinity = 24)
Hello from thread 25, on nid00009. (core affinity = 25)
Hello from thread 26, on nid00009. (core affinity = 26)
Hello from thread 27, on nid00009. (core affinity = 27)
Hello from thread 28, on nid00009. (core affinity = 28)
Hello from thread 29, on nid00009. (core affinity = 29)
Hello from thread 30, on nid00009. (core affinity = 30)
Hello from thread 31, on nid00009. (core affinity = 31)
======================================================================

The Intel 17.0.2 OpenMP runtime uses the correct affinity in both cases:

======================================================================
$ icc --version
icc (ICC) 17.0.2 20170213
Copyright (C) 1985-2017 Intel Corporation.  All rights reserved.

$ icc -qopenmp -o ./xthi-omp.x xthi-omp.c
$ OMP_NUM_THREADS=32 OMP_PLACES=threads OMP_PROC_BIND=spread ./xthi-omp.x |
sort -k 4n,4n
Hello from thread 0, on nid00009. (core affinity = 0)
Hello from thread 1, on nid00009. (core affinity = 1)
Hello from thread 2, on nid00009. (core affinity = 2)
Hello from thread 3, on nid00009. (core affinity = 3)
Hello from thread 4, on nid00009. (core affinity = 4)
Hello from thread 5, on nid00009. (core affinity = 5)
Hello from thread 6, on nid00009. (core affinity = 6)
Hello from thread 7, on nid00009. (core affinity = 7)
Hello from thread 8, on nid00009. (core affinity = 8)
Hello from thread 9, on nid00009. (core affinity = 9)
Hello from thread 10, on nid00009. (core affinity = 10)
Hello from thread 11, on nid00009. (core affinity = 11)
Hello from thread 12, on nid00009. (core affinity = 12)
Hello from thread 13, on nid00009. (core affinity = 13)
Hello from thread 14, on nid00009. (core affinity = 14)
Hello from thread 15, on nid00009. (core affinity = 15)
Hello from thread 16, on nid00009. (core affinity = 16)
Hello from thread 17, on nid00009. (core affinity = 17)
Hello from thread 18, on nid00009. (core affinity = 18)
Hello from thread 19, on nid00009. (core affinity = 19)
Hello from thread 20, on nid00009. (core affinity = 20)
Hello from thread 21, on nid00009. (core affinity = 21)
Hello from thread 22, on nid00009. (core affinity = 22)
Hello from thread 23, on nid00009. (core affinity = 23)
Hello from thread 24, on nid00009. (core affinity = 24)
Hello from thread 25, on nid00009. (core affinity = 25)
Hello from thread 26, on nid00009. (core affinity = 26)
Hello from thread 27, on nid00009. (core affinity = 27)
Hello from thread 28, on nid00009. (core affinity = 28)
Hello from thread 29, on nid00009. (core affinity = 29)
Hello from thread 30, on nid00009. (core affinity = 30)
Hello from thread 31, on nid00009. (core affinity = 31)
$ OMP_NUM_THREADS=32 OMP_PLACES=threads OMP_PROC_BIND=close ./xthi-omp.x | sort
-k 4n,4n
Hello from thread 0, on nid00009. (core affinity = 0)
Hello from thread 1, on nid00009. (core affinity = 32)
Hello from thread 2, on nid00009. (core affinity = 1)
Hello from thread 3, on nid00009. (core affinity = 33)
Hello from thread 4, on nid00009. (core affinity = 2)
Hello from thread 5, on nid00009. (core affinity = 34)
Hello from thread 6, on nid00009. (core affinity = 3)
Hello from thread 7, on nid00009. (core affinity = 35)
Hello from thread 8, on nid00009. (core affinity = 4)
Hello from thread 9, on nid00009. (core affinity = 36)
Hello from thread 10, on nid00009. (core affinity = 5)
Hello from thread 11, on nid00009. (core affinity = 37)
Hello from thread 12, on nid00009. (core affinity = 6)
Hello from thread 13, on nid00009. (core affinity = 38)
Hello from thread 14, on nid00009. (core affinity = 7)
Hello from thread 15, on nid00009. (core affinity = 39)
Hello from thread 16, on nid00009. (core affinity = 8)
Hello from thread 17, on nid00009. (core affinity = 40)
Hello from thread 18, on nid00009. (core affinity = 9)
Hello from thread 19, on nid00009. (core affinity = 41)
Hello from thread 20, on nid00009. (core affinity = 10)
Hello from thread 21, on nid00009. (core affinity = 42)
Hello from thread 22, on nid00009. (core affinity = 11)
Hello from thread 23, on nid00009. (core affinity = 43)
Hello from thread 24, on nid00009. (core affinity = 12)
Hello from thread 25, on nid00009. (core affinity = 44)
Hello from thread 26, on nid00009. (core affinity = 13)
Hello from thread 27, on nid00009. (core affinity = 45)
Hello from thread 28, on nid00009. (core affinity = 14)
Hello from thread 29, on nid00009. (core affinity = 46)
Hello from thread 30, on nid00009. (core affinity = 15)
Hello from thread 31, on nid00009. (core affinity = 47)
======================================================================

Output of "lstopo --of console":

======================================================================
Machine (126GB total)
  NUMANode L#0 (P#0 63GB) + Package L#0 + L3 L#0 (40MB)
    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
      PU L#0 (P#0)
      PU L#1 (P#32)
    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
      PU L#2 (P#1)
      PU L#3 (P#33)
    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
      PU L#4 (P#2)
      PU L#5 (P#34)
    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
      PU L#6 (P#3)
      PU L#7 (P#35)
    L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
      PU L#8 (P#4)
      PU L#9 (P#36)
    L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
      PU L#10 (P#5)
      PU L#11 (P#37)
    L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
      PU L#12 (P#6)
      PU L#13 (P#38)
    L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
      PU L#14 (P#7)
      PU L#15 (P#39)
    L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8
      PU L#16 (P#8)
      PU L#17 (P#40)
    L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9
      PU L#18 (P#9)
      PU L#19 (P#41)
    L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10
      PU L#20 (P#10)
      PU L#21 (P#42)
    L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11
      PU L#22 (P#11)
      PU L#23 (P#43)
    L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12
      PU L#24 (P#12)
      PU L#25 (P#44)
    L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13
      PU L#26 (P#13)
      PU L#27 (P#45)
    L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14
      PU L#28 (P#14)
      PU L#29 (P#46)
    L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15
      PU L#30 (P#15)
      PU L#31 (P#47)
  NUMANode L#1 (P#1 63GB) + Package L#1 + L3 L#1 (40MB)
    L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16
      PU L#32 (P#16)
      PU L#33 (P#48)
    L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17
      PU L#34 (P#17)
      PU L#35 (P#49)
    L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18
      PU L#36 (P#18)
      PU L#37 (P#50)
    L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19
      PU L#38 (P#19)
      PU L#39 (P#51)
    L2 L#20 (256KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20
      PU L#40 (P#20)
      PU L#41 (P#52)
    L2 L#21 (256KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21
      PU L#42 (P#21)
      PU L#43 (P#53)
    L2 L#22 (256KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22
      PU L#44 (P#22)
      PU L#45 (P#54)
    L2 L#23 (256KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23
      PU L#46 (P#23)
      PU L#47 (P#55)
    L2 L#24 (256KB) + L1d L#24 (32KB) + L1i L#24 (32KB) + Core L#24
      PU L#48 (P#24)
      PU L#49 (P#56)
    L2 L#25 (256KB) + L1d L#25 (32KB) + L1i L#25 (32KB) + Core L#25
      PU L#50 (P#25)
      PU L#51 (P#57)
    L2 L#26 (256KB) + L1d L#26 (32KB) + L1i L#26 (32KB) + Core L#26
      PU L#52 (P#26)
      PU L#53 (P#58)
    L2 L#27 (256KB) + L1d L#27 (32KB) + L1i L#27 (32KB) + Core L#27
      PU L#54 (P#27)
      PU L#55 (P#59)
    L2 L#28 (256KB) + L1d L#28 (32KB) + L1i L#28 (32KB) + Core L#28
      PU L#56 (P#28)
      PU L#57 (P#60)
    L2 L#29 (256KB) + L1d L#29 (32KB) + L1i L#29 (32KB) + Core L#29
      PU L#58 (P#29)
      PU L#59 (P#61)
    L2 L#30 (256KB) + L1d L#30 (32KB) + L1i L#30 (32KB) + Core L#30
      PU L#60 (P#30)
      PU L#61 (P#62)
    L2 L#31 (256KB) + L1d L#31 (32KB) + L1i L#31 (32KB) + Core L#31
      PU L#62 (P#31)
      PU L#63 (P#63)
======================================================================

Reply via email to