PowerVM systems configured in shared processors mode have some unique challenges. Some device-tree properties will be missing on a shared processor. Hence some sched domains may not make sense for shared processor systems.
Most shared processor systems are over-provisioned. Underlying PowerVM Hypervisor would schedule at a Big Core granularity. The most recent power processors support two almost independent cores. In a lightly loaded condition, it helps the overall system performance if we pack to lesser number of Big Cores. System Configuration type=Shared mode=Capped smt=8 lcpu=128 mem=1066732224 kB cpus=96 ent=40.00 So *40 Entitled cores / 128 Virtual processors* scenario. lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 1024 On-line CPU(s) list: 0-1023 Model name: POWER10 (architected), altivec supported Model: 2.0 (pvr 0080 0200) Thread(s) per core: 8 Core(s) per socket: 16 Socket(s): 8 Hypervisor vendor: pHyp Virtualization type: para L1d cache: 8 MiB (256 instances) L1i cache: 12 MiB (256 instances) NUMA node(s): 8 NUMA node0 CPU(s): 0-7,64-71,128-135,192-199,256-263,320-327,384-391,448-455,512-519,576-583,640-647,704-711,768-775,832-839,896-903,960-967 NUMA node1 CPU(s): 8-15,72-79,136-143,200-207,264-271,328-335,392-399,456-463,520-527,584-591,648-655,712-719,776-783,840-847,904-911,968-975 NUMA node2 CPU(s): 16-23,80-87,144-151,208-215,272-279,336-343,400-407,464-471,528-535,592-599,656-663,720-727,784-791,848-855,912-919,976-983 NUMA node3 CPU(s): 24-31,88-95,152-159,216-223,280-287,344-351,408-415,472-479,536-543,600-607,664-671,728-735,792-799,856-863,920-927,984-991 NUMA node4 CPU(s): 32-39,96-103,160-167,224-231,288-295,352-359,416-423,480-487,544-551,608-615,672-679,736-743,800-807,864-871,928-935,992-999 NUMA node5 CPU(s): 40-47,104-111,168-175,232-239,296-303,360-367,424-431,488-495,552-559,616-623,680-687,744-751,808-815,872-879,936-943,1000-1007 NUMA node6 CPU(s): 48-55,112-119,176-183,240-247,304-311,368-375,432-439,496-503,560-567,624-631,688-695,752-759,816-823,880-887,944-951,1008-1015 NUMA node7 CPU(s): 56-63,120-127,184-191,248-255,312-319,376-383,440-447,504-511,568-575,632-639,696-703,760-767,824-831,888-895,952-959,1016-1023 ebizzy -t 40 -S 200 (5 iterations) Records per second. (Higher is better) Kernel N Min Max Median Avg Stddev %Change v6.5 5 4664647 5148125 5130549 5043050.2 211756.06 +patch 5 4769453 5220808 5137476 5040333.8 193586.43 -0.0538642 >From lparstat (when the workload stabilized) Kernel %user %sys %wait %idle physc %entc lbusy app vcsw phint v6.5 6.23 0.00 0.00 93.77 40.06 100.15 6.23 55.92 138699651 100 +patch 6.26 0.01 0.00 93.73 21.15 52.87 6.27 74.78 71743299 148 ebizzy -t 80 -S 200 (5 iterations) Records per second. (Higher is better) Kernel N Min Max Median Avg Stddev %Change v6.5 5 8735907 9121401 8986218 8967125.6 152793.38 +patch 5 9636679 9990229 9765958 9770081.8 143913.29 8.95444 >From lparstat (when the workload stabilized) Kernel %user %sys %wait %idle physc %entc lbusy app vcsw phint v6.5 12.40 0.01 0.00 87.60 71.05 177.62 12.40 24.61 98047012 85 +patch 12.47 0.02 0.00 87.50 41.06 102.65 12.50 54.90 77821678 158 ebizzy -t 160 -S 200 (5 iterations) Records per second. (Higher is better) Kernel N Min Max Median Avg Stddev %Change v6.5 5 12378356 12946633 12780732 12682369 266135.73 +patch 5 16756702 17676670 17406971 17341585 346054.89 36.7377 >From lparstat (when the workload stabilized) Kernel %user %sys %wait %idle physc %entc lbusy app vcsw phint v6.5 24.56 0.09 0.15 75.19 77.42 193.55 24.65 17.94 135625276 98 +patch 24.78 0.03 0.00 75.19 78.33 195.83 24.81 17.17 107826112 215 ------------------------------------------------------------------------- System Configuration type=Shared mode=Capped smt=8 lcpu=40 mem=1066732672 kB cpus=96 ent=40.00 So *40 Entitled cores / 40 Virtual processors* scenario. lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 320 On-line CPU(s) list: 0-319 Model name: POWER10 (architected), altivec supported Model: 2.0 (pvr 0080 0200) Thread(s) per core: 8 Core(s) per socket: 10 Socket(s): 4 Hypervisor vendor: pHyp Virtualization type: para L1d cache: 2.5 MiB (80 instances) L1i cache: 3.8 MiB (80 instances) NUMA node(s): 4 NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,128-135,160-167,192-199,224-231,256-263,288-295 NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,136-143,168-175,200-207,232-239,264-271,296-303 NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,144-151,176-183,208-215,240-247,272-279,304-311 NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,152-159,184-191,216-223,248-255,280-287,312-319 ebizzy -t 40 -S 200 (5 iterations) Records per second. (Higher is better) Kernel N Min Max Median Avg Stddev %Change v6.5 5 4966196 5148045 5078348 5072977.4 66572.122 +patch 5 5035210 5232882 5158456 5151734 78906.893 1.55247 >From lparstat (when the workload stabilized) Kernel %user %sys %wait %idle physc %entc lbusy app vcsw phint v6.5 12.58 0.02 0.00 87.41 40.00 100.00 12.59 55.97 1029603 82 +patch 12.58 0.02 0.00 87.40 21.16 52.90 12.60 74.82 1188571 657 ebizzy -t 80 -S 200 (5 iterations) Records per second. (Higher is better) Kernel N Min Max Median Avg Stddev %Change v6.5 5 10081713 10162128 10145721 10128119 35603.196 +patch 5 9928483 10430256 10338097 10218466 221155.16 0.892041 >From lparstat (when the workload stabilized) Kernel %user %sys %wait %idle physc %entc lbusy app vcsw phint v6.5 25.02 0.06 0.00 74.93 40.00 100.00 25.07 55.99 1530297 92 +patch 25.03 0.04 0.00 74.93 40.00 100.00 25.07 55.99 2475875 667 ebizzy -t 160 -S 200 (5 iterations) Records per second. (Higher is better) Kernel N Min Max Median Avg Stddev %Change v6.5 5 9064802 9169798 9115250 9123968.2 44901.261 +patch 5 9064533 9235200 9072374 9119558.2 76260.411 -0.0483342 >From lparstat (when the workload stabilized) Kernel %user %sys %wait %idle physc %entc lbusy app vcsw phint v6.5 49.94 0.03 0.00 50.03 40.06 100.15 49.97 55.99 2058879 93 +patch 49.94 0.03 0.00 50.03 40.06 100.15 49.97 55.99 2058879 93 ------------------------------------------------------------------------- Observation: We are able to see Improvement in ebizzy throughput even with lesser core utilization (almost half the core utilization) in low utilization scenarios while still retaining throughput in mid and higher utilization scenarios. Note: The numbers are with Uncapped + no-noise case. In the Capped and/or noise case, due to contention on the Cores, the numbers are expected to further improve. Srikar Dronamraju (4): powerpc/smp: Cache CPU has Asymmetric SMP powerpc/smp: Move shared_processor static key to smp.h powerpc/smp: Enable Asym packing for cores on shared processor powerpc/smp: Disable MC domain for shared processor arch/powerpc/include/asm/paravirt.h | 12 ----------- arch/powerpc/include/asm/smp.h | 14 +++++++++++++ arch/powerpc/kernel/smp.c | 31 +++++++++++++++++++---------- 3 files changed, 35 insertions(+), 22 deletions(-) -- 2.41.0