On Tue, Jun 28, 2016 at 02:43:36PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 17, 2016 at 12:43:22PM -0500, Josh Poimboeuf wrote:
> > NOTE: I didn't include any performance numbers because I wasn't able to
> > get consistent results.  I tried the following on a Xeon E5-2420 v2 CPU:
> > 
> >   $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo 
> > -n performance > $i; done
> >   $ echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
> >   $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
> >   $ echo 0 > /proc/sys/kernel/nmi_watchdog
> >   $ taskset 0x10 perf stat -n -r10 perf bench sched pipe -l 1000000
> > 
> > I was going to post the numbers from that, both with and without
> > SCHEDSTATS, but then when I tried to repeat the test on a different day,
> > the results were surprisingly different, with different conclusions.
> > 
> > So any advice on measuring scheduler performance would be appreciated...
> 
> Yeah, its a bit of a pain in general...
> 
> A) perf stat --null --repeat 50 -- perf bench sched messaging -g 50 -l 5000 | 
> grep "seconds time elapsed"
> B) perf stat --null --repeat 50 -- taskset 1 perf bench sched pipe | grep 
> "seconds time elapsed"
> 
> 1) tip/master + 1-4
> 2) tip/master + 1-5
> 3) tip/master + 1-5 + below
> 
>       1               2               3
> 
> A)    4.627767855     4.650429917     4.646208062
>       4.633921933     4.641424424     4.612021058
>       4.649536375     4.663144144     4.636815948
>       4.630165619     4.649053552     4.613022902
> 
> B)    1.770732957     1.789534273     1.773334291
>       1.761740716     1.795618428     1.773338681
>       1.763761666     1.822316496     1.774385589
> 
> 
> From this it looks like patch 5 does hurt a wee bit, but we can get most
> of that back by reordering the structure a bit. The results seem
> 'stable' across rebuilds and reboots (I've pop'ed all patches and
> rebuild, rebooted and re-benched 1 at the end and obtained similar
> results).
> 
> Although, possible that if we reorder first and then do 5, we'll just
> see a bigger regression. I've not bothered.

Thanks a lot for benchmarking this!  And also for improving the cache
alignments.  Your changes look good to me.

-- 
Josh

Reply via email to