On Sun, Apr 23, 2017 at 5:31 PM, Doug Smythies <dsmyth...@telus.net> wrote: > On 2017.04.22 14:08 Rafael wrote: >> On Friday, April 21, 2017 11:29:06 PM Doug Smythies wrote: >>> On 2017.04.20 18:18 Rafael wrote: >>>> On Thursday, April 20, 2017 07:55:57 AM Doug Smythies wrote: >>>>> On 2017.04.19 01:16 Mel Gorman wrote: >>>>>> On Fri, Apr 14, 2017 at 04:01:40PM -0700, Doug Smythies wrote: >>>>>>> Hi Mel, >>>> >>>> [cut] >>>> >>>>>> And the revert does help albeit not being an option for reasons Rafael >>>>>> covered. >>>>> >>>>> New data point: Kernel 4.11-rc7 intel_pstate, powersave forcing the >>>>> load based algorithm: Elapsed 3178 seconds. >>>>> >>>>> If I understand your data correctly, my load based results are the >>>>> opposite of yours. >>>>> >>>>> Mel: 4.11-rc5 vanilla: Elapsed mean: 3750.20 Seconds >>>>> Mel: 4.11-rc5 load based: Elapsed mean: 2503.27 Seconds >>>>> Or: 33.25% >>>>> >>>>> Doug: 4.11-rc6 stock: Elapsed total (5 runs): 2364.45 Seconds >>>>> Doug: 4.11-rc7 force load based: Elapsed total (5 runs): 3178 Seconds >>>>> Or: -34.4% >>>> >>>> I wonder if you can do the same thing I've just advised Mel to do. That >>>> is, >>>> take my linux-next branch: >>>> >>>> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git >>>> linux-next >>>> >>>> (which is new material for 4.12 on top of 4.11-rc7) and reduce >>>> INTEL_PSTATE_DEFAULT_SAMPLING_INTERVAL (in intel_pstate.c) in it by 1/2 >>>> (force load-based if need be, I'm not sure what PM profile of your test >>>> system >>>> is). >>> >>> I did not need to force load-based. I do not know how to figure it out from >>> an acpidump the way Srinivas does. I did a trace and figured out what >>> algorithm >>> it was using from the data. >>> >>> Reference test, before changing INTEL_PSTATE_DEFAULT_SAMPLING_INTERVAL: >>> 3239.4 seconds. >>> >>> Test after changing INTEL_PSTATE_DEFAULT_SAMPLING_INTERVAL: >>> 3195.5 seconds. >> >> So it does have an effect, but relatively small. > > I don't know how repeatable the tests results are. > i.e. I don't know if the 1.36% change is within experimental > error or not. That being said, the trend does seem consistent. > >> I wonder if further reducing INTEL_PSTATE_DEFAULT_SAMPLING_INTERVAL to 2 ms >> will make any difference. > > I went all the way to 1 ms, just for the test: > 3123.9 Seconds > >>> By far, and with any code, I get the fastest elapsed time, of course next >>> to performance mode, but not by much, by limiting the test to only use >>> just 1 cpu: 1814.2 Seconds. >> >> Interesting. >> >> It looks like the cost is mostly related to moving the load from one CPU to >> another and waiting for the new one to ramp up then. >> >> I guess the workload consists of many small tasks that each start on new CPUs >> and cause that ping-pong to happen. > > Yes, and (from trace data) many tasks are very very very small. Also the test > appears to take a few holidays, of up to 1 second, during execution. > >>> (performance governor, restated from a previous e-mail: 1776.05 seconds) >> >> But that causes the processor to stay in the maximum sustainable P-state all >> the time, which on Sandy Bridge is quite costly energetically. > > Agreed. I only provide these data points as a reference and so that we know > what the boundary conditions (limits) are. > >> We can do one more trick I forgot about. Namely, if we are about to increase >> the P-state, we can jump to the average between the target and the max >> instead of just the target, like in the appended patch (on top of >> linux-next). >> >> That will make the P-state selection really aggressive, so costly >> energetically, >> but it shoud small jumps of the average load above 0 to case big jumps of >> the target P-state. > > I'm already seeing the energy costs of some of this stuff. > 3050.2 Seconds.
Is this with or without reducing the sampling interval? > Idle power 4.06 Watts. > > Idle power for kernel 4.11-rc7 (performance-based): 3.89 Watts. > Idle power for kernel 4.11-rc7, using load-based: 4.01 watts > Idle power for kernel 4.11-rc7 next linux-pm: 3.91 watts Power draw differences are not dramatic, so this might be a viable change depending on the influence on the results elsewhere. Anyway, your results are somewhat counter-intuitive. Would it be possible to run this workload with the linux-next branch and the schedutil governor and see if the patch at https://patchwork.kernel.org/patch/9671829/ makes any difference? Thanks, Rafael