On Tue, Oct 08, 2019 at 05:53:11PM +0200 Vincent Guittot wrote: > Hi Phil, >
... > While preparing v4, I have noticed that I have probably oversimplified > the end of find_idlest_group() in patch "sched/fair: optimize > find_idlest_group" when it compares local vs the idlest other group. > Especially, there were a NUMA specific test that I removed in v3 and > re added in v4. > > Then, I'm also preparing a full rework that find_idlest_group() which > will behave more closely to load_balance; I mean : collect statistics, > classify group then selects the idlest > Okay, I'll watch for V4 and restest. It really seems to be limited to the 8-node system. None of the other systems are showing this. > What is the behavior of lu.C Thread ? are they waking up a lot ?and > could trigger the slow wake path ? Yes, probably a fair bit of waking. It's an interative equation solving code. It's fairly CPU intensive but requires communication for dependent calculations. That's part of why having them mis-balanced causes such a large slow down. I think at times everyone else waits for the slow guys. > > > > > The initial fixes I made for this issue did not exhibit this behavior. They > > would have had other issues dealing with overload cases though. In this case > > however there are only 154 or 158 threads on 160 CPUs so not overloaded. > > > > I'll try to get my hands on this system and poke into it. I just wanted to > > get > > your thoughts and let you know where we are. > > Thanks for testing > Sure! Thanks, Phil > Vincent > > > > > > > > > Thanks, > > Phil > > > > > > System details: > > > > Architecture: x86_64 > > CPU op-mode(s): 32-bit, 64-bit > > Byte Order: Little Endian > > CPU(s): 160 > > On-line CPU(s) list: 0-159 > > Thread(s) per core: 2 > > Core(s) per socket: 10 > > Socket(s): 8 > > NUMA node(s): 8 > > Vendor ID: GenuineIntel > > CPU family: 6 > > Model: 47 > > Model name: Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz > > Stepping: 2 > > CPU MHz: 1063.934 > > BogoMIPS: 4787.73 > > Virtualization: VT-x > > L1d cache: 32K > > L1i cache: 32K > > L2 cache: 256K > > L3 cache: 30720K > > NUMA node0 CPU(s): 0-9,80-89 > > NUMA node1 CPU(s): 10-19,90-99 > > NUMA node2 CPU(s): 20-29,100-109 > > NUMA node3 CPU(s): 30-39,110-119 > > NUMA node4 CPU(s): 40-49,120-129 > > NUMA node5 CPU(s): 50-59,130-139 > > NUMA node6 CPU(s): 60-69,140-149 > > NUMA node7 CPU(s): 70-79,150-159 > > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl > > vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes > > lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid dtherm ida arat > > > > $ numactl --hardware > > available: 8 nodes (0-7) > > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 80 81 82 83 84 85 86 87 88 89 > > node 0 size: 64177 MB > > node 0 free: 60866 MB > > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 90 91 92 93 94 95 96 97 98 99 > > node 1 size: 64507 MB > > node 1 free: 61167 MB > > node 2 cpus: 20 21 22 23 24 25 26 27 28 29 100 101 102 103 104 105 106 107 > > 108 > > 109 > > node 2 size: 64507 MB > > node 2 free: 61250 MB > > node 3 cpus: 30 31 32 33 34 35 36 37 38 39 110 111 112 113 114 115 116 117 > > 118 > > 119 > > node 3 size: 64507 MB > > node 3 free: 61327 MB > > node 4 cpus: 40 41 42 43 44 45 46 47 48 49 120 121 122 123 124 125 126 127 > > 128 > > 129 > > node 4 size: 64507 MB > > node 4 free: 60993 MB > > node 5 cpus: 50 51 52 53 54 55 56 57 58 59 130 131 132 133 134 135 136 137 > > 138 > > 139 > > node 5 size: 64507 MB > > node 5 free: 60892 MB > > node 6 cpus: 60 61 62 63 64 65 66 67 68 69 140 141 142 143 144 145 146 147 > > 148 > > 149 > > node 6 size: 64507 MB > > node 6 free: 61139 MB > > node 7 cpus: 70 71 72 73 74 75 76 77 78 79 150 151 152 153 154 155 156 157 > > 158 > > 159 > > node 7 size: 64480 MB > > node 7 free: 61188 MB > > node distances: > > node 0 1 2 3 4 5 6 7 > > 0: 10 12 17 17 19 19 19 19 > > 1: 12 10 17 17 19 19 19 19 > > 2: 17 17 10 12 19 19 19 19 > > 3: 17 17 12 10 19 19 19 19 > > 4: 19 19 19 19 10 12 17 17 > > 5: 19 19 19 19 12 10 17 17 > > 6: 19 19 19 19 17 17 10 12 > > 7: 19 19 19 19 17 17 12 10 > > > > > > > > -- --