On 10/10/16 19:29, Vincent Guittot wrote: > On 10 October 2016 at 15:54, Dietmar Eggemann <dietmar.eggem...@arm.com> > wrote: >> On 10/10/16 13:29, Vincent Guittot wrote: >>> On 10 October 2016 at 12:01, Matt Fleming <m...@codeblueprint.co.uk> wrote: >>>> On Sun, 09 Oct, at 11:39:27AM, Wanpeng Li wrote:
[...] >>> I have tried to reprocude your issue on my target an hikey board (ARM >>> based octo cores) but i failed to see a regression with commit >>> 7dc603c9028e. Neverthless, i can see tasks not been well spread >> >> Wasn't this about the two patches mentioned in this thread? The one from >> Matt using 'se->sum_exec_runtime' in the if condition in >> enqueue_entity_load_avg() and Peterz's conditional call to >> update_rq_clock(rq) in enqueue_task()? > > I was trying to reproduce the regression that Matt mentioned at the > beg of the thread not those linked to proposed fixes OK. > >> >>> during fork as you mentioned. So I have studied a bit more the >>> spreading issue during fork last week and i have a new version of my >>> proposed patch that i'm going to send soon. With this patch, i can see >>> a good spread of tasks during the fork sequence and some kind of perf >>> improvement even if it's bit difficult as the variance is quite >>> important with hackbench test so it's mainly an improvement of >>> repeatability of the result >> >> Hikey (ARM64 2x4 cpus) board: cpufreq: performance, cpuidle: disabled >> >> Performance counter stats for 'perf bench sched messaging -g 20 -l 500' >> (10 runs): >> >> (1) tip/sched/core: commit 447976ef4fd0 >> >> 5.902209533 seconds time elapsed ( +- 0.31% ) > > This seems to be too long to test the impact of the forking phase of hackbench [...] Yeah, you're right. But I can't see any significant difference. IMHO, it's all in the noise. (A) Performance counter stats for 'perf bench sched messaging -g 100 -l 1 -t' # 20 sender and receiver threads per group # 100 groups == 4000 threads run (1) tip/sched/core: commit 447976ef4fd0 Total time: 0.188 [sec] (2) tip/sched/core + original patch on the 'sched/fair: Do not decay new task load on first enqueue' thread (23/09/16) Total time: 0.199 [sec] (3) tip/sched/core + Peter's ENQUEUE_NEW patch on the 'sched/fair: Do not decay new task load on first enqueue' thread (28/09/16) Total time: 0.178 [sec] (B) hackbench -P -g 1 Running in process mode with 1 groups using 40 file descriptors each (== 40 tasks) Each sender will pass 100 messages of 100 bytes (1) 0.067 (2) 0.083 (3) 0.073 (C) hackbench -T -g 1 Running in threaded mode with 1 groups using 40 file descriptors each (== 40 tasks) Each sender will pass 100 messages of 100 bytes (1) 0.077 (2) 0.079 (3) 0.072 Maybe, instead of the performance gov, I should pin the frequency to a lower one to eliminate the thermal influence on this Hikey board.