On Tue, 30 Jun 2020 at 13:22, Qais Yousef <qais.you...@arm.com> wrote: > > This series attempts to address the report that uclamp logic could be > expensive > sometimes and shows a regression in netperf UDP_STREAM under certain > conditions. > > The first patch is a fix for how struct uclamp_rq is initialized which is > required by the 2nd patch which contains the real 'fix'. > > Worth noting that the root cause of the overhead is believed to be system > specific or related to potential certain code/data layout issues, leading to > worse I/D $ performance. > > Different systems exhibited different behaviors and the regression did > disappear in certain kernel version while attempting to reporoduce. > > More info can be found here: > > https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/ > > Having the static key seemed the best thing to do to ensure the effect of > uclamp is minimized for kernels that compile it in but don't have a userspace > that uses it, which will allow distros to distribute uclamp capable kernels by > default without having to compromise on performance for some systems that > could > be affected. > > Changes in v6: > * s/uclamp_is_enabled/uclamp_is_used/ + add comment > * Improve the bailout condition for the case where we could end up > with > unbalanced call of uclamp_rq_dec_id() > * Clarify some comments. > > Changes in v5: > * Fix a race that could happen when order of enqueue/dequeue of tasks > A and B is not done in order, and sched_uclamp_used is enabled in > between. > * Add more comments explaining the race and the behavior of > uclamp_rq_util_with() which is now protected with a static key to be > a NOP. When no uclamp aggregation at rq level is done, this function > can't do much. > > Changes in v4: > * Fix broken boosting of RT tasks when static key is disabled. > > Changes in v3: > * Avoid double negatives and rename the static key to uclamp_used > * Unconditionally enable the static key through any of the paths where > the user can modify the default uclamp value. > * Use C99 named struct initializer for struct uclamp_rq which is > easier > to read than the memset(). > > Changes in v2: > * Add more info in the commit message about the result of perf diff to > demonstrate that the activate/deactivate_task pressure is reduced in > the fast path. > > * Fix sparse warning reported by the test robot. > > * Add an extra commit about using static_branch_likely() instead of > static_branch_unlikely(). > > Thanks > > -- > Qais Yousef > > Cc: Juri Lelli <juri.le...@redhat.com> > Cc: Vincent Guittot <vincent.guit...@linaro.org> > Cc: Dietmar Eggemann <dietmar.eggem...@arm.com> > Cc: Steven Rostedt <rost...@goodmis.org> > Cc: Ben Segall <bseg...@google.com> > Cc: Mel Gorman <mgor...@suse.de> > CC: Patrick Bellasi <patrick.bell...@matbug.net> > Cc: Chris Redpath <chris.redp...@arm.com> > Cc: Lukasz Luba <lukasz.l...@arm.com> > Cc: linux-kernel@vger.kernel.org > > > Qais Yousef (2): > sched/uclamp: Fix initialization of struct uclamp_rq > sched/uclamp: Protect uclamp fast path code with static key
I have run the perf bench sched pipe that have have already run previously with this v6 and the results are similar to my previous tests: The impact is -1.61% similarly to v2 which is better compared the original -3.66% without your patch > > kernel/sched/core.c | 95 ++++++++++++++++++++++++++++++-- > kernel/sched/cpufreq_schedutil.c | 2 +- > kernel/sched/sched.h | 47 +++++++++++++++- > 3 files changed, 135 insertions(+), 9 deletions(-) > > -- > 2.17.1 >