This is a respin of: https://lore.kernel.org/lkml/20181030160947.19581-1-patrick.bell...@arm.com/
rebased on v4.20-rc1, which addresses Peter's comments by also adding a couple of additional cleanup patches on top. Tests on a 40 CPUs Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz system still reports the ~10-15% Execl Throughput improvements after applying the first patch. Those benefits are not there if we remove the additional test on "current == p" which Peter was asking about. I guess the race condition described in the new inline comment I've now added could be the reason for the additional test being required, but I did not really verified that guess. I've just kept both conditions but swapped them since we will probably be more likely to call cpu_util_without() with a task which is eventually marked as task_on_rq_queued(). The second patch is pretty simple, while the last one implements what Peter suggested in the previous review. I did not used something similar to sub_positive, as suggested by Peter, just because in my tests that implementation seems to affect negatively the Execl Throughput tests results by reducing the speedup we get with the proposed version. Best Patrick Patrick Bellasi (3): sched/fair: util_est: fix cpu_util_wake for execl sched/fair: util_est: mask UTIL_AVG_UNCHANGED usages sched/fair: add lsub_positive and use it consistently kernel/sched/fair.c | 85 ++++++++++++++++++++++++++++++++++----------- 1 file changed, 64 insertions(+), 21 deletions(-) -- 2.18.0