On Mon, Jan 25, 2021 at 12:29:47PM +0800, Li, Aubrey wrote: > >>> hackbench -l 2560 -g 1 on 8 cores arm64 > >>> v5.11-rc4 : 1.355 (+/- 7.96) > >>> + sis improvement : 1.923 (+/- 25%) > >>> + the patch below : 1.332 (+/- 4.95) > >>> > >>> hackbench -l 2560 -g 256 on 8 cores arm64 > >>> v5.11-rc4 : 2.116 (+/- 4.62%) > >>> + sis improvement : 2.216 (+/- 3.84%) > >>> + the patch below : 2.113 (+/- 3.01%) > >>> > > 4 benchmarks reported out during weekend, with patch 3 on a x86 4s system > with 24 cores per socket and 2 HT per core, total 192 CPUs. > > It looks like mid-load has notable changes on my side: > - netperf 50% num of threads in TCP mode has 27.25% improved > - tbench 50% num of threads has 9.52% regression >
It's interesting that patch 3 would make any difference on x64 given that it's SMT2. The scan depth should have been similar. It's somewhat expected that it will not be a universal win, particularly once the utilisation is high enough to spill over in sched domains (25%, 50%, 75% utilisation being interesting on 4-socket systems). In such cases, double scanning can still show improvements for workloads that idle rapidly like tbench and hackbench even though it's expensive. The extra scanning gives more time for a CPU to go idle enough to be selected which can improve throughput but at the cost of wake-up latency, Hopefully v4 can be tested as well which is now just a single scan. -- Mel Gorman SUSE Labs