On 2019/4/28 20:17, Ingo Molnar wrote: > > * Aubrey Li <aubrey.in...@gmail.com> wrote: > >> On Sun, Apr 28, 2019 at 5:33 PM Ingo Molnar <mi...@kernel.org> wrote: >>> So because I'm a big fan of presenting data in a readable fashion, here >>> are your results, tabulated: >> >> I thought I tried my best to make it readable, but this one looks much >> better, >> thanks, ;-) >>> >>> # >>> # Sysbench throughput comparison of 3 different kernels at different >>> # load levels, higher numbers are better: >>> # >>> >>> >>> .--------------------------------------|----------------------------------------------------------------. >>> | NA/AVX vanilla-SMT [stddev%] |coresched-SMT [stddev%] +/- | >>> no-SMT [stddev%] +/- | >>> >>> |--------------------------------------|----------------------------------------------------------------| >>> | 1/1 508.5 [ 0.2% ] | 504.7 [ 1.1% ] 0.8% | >>> 509.0 [ 0.2% ] 0.1% | >>> | 2/2 1000.2 [ 1.4% ] | 1004.1 [ 1.6% ] 0.4% | >>> 997.6 [ 1.2% ] 0.3% | >>> | 4/4 1912.1 [ 1.0% ] | 1904.2 [ 1.1% ] 0.4% | >>> 1914.9 [ 1.3% ] 0.1% | >>> | 8/8 3753.5 [ 0.3% ] | 3748.2 [ 0.3% ] 0.1% | >>> 3751.3 [ 0.4% ] 0.1% | >>> | 16/16 7139.3 [ 2.4% ] | 7137.9 [ 1.8% ] 0.0% | >>> 7049.2 [ 2.4% ] 1.3% | >>> | 32/32 10899.0 [ 4.2% ] | 10780.3 [ 4.4% ] -1.1% | >>> 10339.2 [ 9.6% ] -5.1% | >>> | 64/64 15086.1 [ 11.5% ] | 14262.0 [ 8.2% ] -5.5% | >>> 11168.7 [ 22.2% ] -26.0% | >>> | 128/128 15371.9 [ 22.0% ] | 14675.8 [ 14.4% ] -4.5% | >>> 10963.9 [ 18.5% ] -28.7% | >>> | 256/256 15990.8 [ 22.0% ] | 12227.9 [ 10.3% ] -23.5% | >>> 10469.9 [ 19.6% ] -34.5% | >>> >>> '--------------------------------------|----------------------------------------------------------------' >>> >>> One major thing that sticks out is that if we compare the stddev numbers >>> to the +/- comparisons then it's pretty clear that the benchmarks are >>> very noisy: in all but the last row stddev is actually higher than the >>> measured effect. >>> >>> So what does 'stddev' mean here, exactly? The stddev of multipe runs, >>> i.e. measured run-to-run variance? Or is it some internal metric of the >>> benchmark? >>> >> >> The benchmark periodically reports intermediate statistics in one second, >> the raw log looks like below: >> [ 11s ] thds: 256 eps: 14346.72 lat (ms,95%): 44.17 >> [ 12s ] thds: 256 eps: 14328.45 lat (ms,95%): 44.17 >> [ 13s ] thds: 256 eps: 13773.06 lat (ms,95%): 43.39 >> [ 14s ] thds: 256 eps: 13752.31 lat (ms,95%): 43.39 >> [ 15s ] thds: 256 eps: 15362.79 lat (ms,95%): 43.39 >> [ 16s ] thds: 256 eps: 26580.65 lat (ms,95%): 35.59 >> [ 17s ] thds: 256 eps: 15011.78 lat (ms,95%): 36.89 >> [ 18s ] thds: 256 eps: 15025.78 lat (ms,95%): 39.65 >> [ 19s ] thds: 256 eps: 15350.87 lat (ms,95%): 39.65 >> [ 20s ] thds: 256 eps: 15491.70 lat (ms,95%): 36.89 >> >> I have a python script to parse eps(events per second) and lat(latency) >> out, and compute the average and stddev. (And I can draw a curve locally). >> >> It's noisy indeed when tasks number is greater than the CPU number. >> It's probably caused by high frequent load balance and context switch. > > Ok, so it's basically an internal workload noise metric, it doesn't > represent the run-to-run noise. > > So it's the real stddev of the workload - but we don't know whether the > measured performance figure is exactly in the middle of the runtime > probability distribution. > >> Do you have any suggestions? Or any other information I can provide? > > Yeah, so we don't just want to know the "standard deviation" of the > measured throughput values, but also the "standard error of the mean". > > I suspect it's pretty low, below 1% for all rows?
Hope my this mail box works for this... .-------------------------------------------------------------------------------------------------------------. |NA/AVX vanilla-SMT [std% / sem%] | coresched-SMT [std% / sem%] +/- | no-SMT [std% / sem%] +/- | |-------------------------------------------------------------------------------------------------------------| | 1/1 508.5 [ 0.2%/ 0.0%] | 504.7 [ 1.1%/ 0.1%] -0.8%| 509.0 [ 0.2%/ 0.0%] 0.1% | | 2/2 1000.2 [ 1.4%/ 0.1%] | 1004.1 [ 1.6%/ 0.2%] 0.4%| 997.6 [ 1.2%/ 0.1%] -0.3% | | 4/4 1912.1 [ 1.0%/ 0.1%] | 1904.2 [ 1.1%/ 0.1%] -0.4%| 1914.9 [ 1.3%/ 0.1%] 0.1% | | 8/8 3753.5 [ 0.3%/ 0.0%] | 3748.2 [ 0.3%/ 0.0%] -0.1%| 3751.3 [ 0.4%/ 0.0%] -0.1% | | 16/16 7139.3 [ 2.4%/ 0.2%] | 7137.9 [ 1.8%/ 0.2%] -0.0%| 7049.2 [ 2.4%/ 0.2%] -1.3% | | 32/32 10899.0 [ 4.2%/ 0.4%] | 10780.3 [ 4.4%/ 0.4%] -1.1%| 10339.2 [ 9.6%/ 0.9%] -5.1% | | 64/64 15086.1 [11.5%/ 1.2%] | 14262.0 [ 8.2%/ 0.8%] -5.5%| 11168.7 [22.2%/ 1.7%] -26.0% | |128/128 15371.9 [22.0%/ 2.2%] | 14675.8 [14.4%/ 1.4%] -4.5%| 10963.9 [18.5%/ 1.4%] -28.7% | |256/256 15990.8 [22.0%/ 2.2%] | 12227.9 [10.3%/ 1.0%] -23.5%| 10469.9 [19.6%/ 1.7%] -34.5% | '-------------------------------------------------------------------------------------------------------------' Thanks, -Aubrey