On 8 November 2013 01:04, Rowand, Frank <frank.row...@sonymobile.com> wrote:
> Hi Vincent,
>
> Thanks for creating some benchmark numbers!

you're welcome

>
>
> On Thursday, November 07, 2013 5:33 AM, Vincent Guittot 
> [vincent.guit...@linaro.org] wrote:
>>
>> On 7 November 2013 12:32, Catalin Marinas <catalin.mari...@arm.com> wrote:
>> > Hi Vincent,
>> >
>> > (for whatever reason, the text is wrapped and results hard to read)
>>
>> Yes, i have just seen that. It looks like gmail has wrapped the lines.
>> I have added the results which should not be wrapped, at the end of this 
>> email
>>
>> >
>> >
>> > On Thu, Nov 07, 2013 at 10:54:30AM +0000, Vincent Guittot wrote:
>> >> During the Energy-aware scheduling mini-summit, we spoke about benches
>> >> that should be used to evaluate the modifications of the scheduler.
>> >> I’d like to propose a bench that uses cyclictest to measure the wake
>> >> up latency and the power consumption. The goal of this bench is to
>> >> exercise the scheduler with various sleeping period and get the
>> >> average wakeup latency. The range of the sleeping period must cover
>> >> all residency times of the idle state table of the platform. I have
>> >> run such tests on a tc2 platform with the packing tasks patchset.
>> >> I have use the following command:
>> >> #cyclictest -t <number of cores> -q -e 10000000 -i <500-12000> -d 150 -l 
>> >> 2000
>
> The number of loops ("-l 2000") should be much larger to create useful
> results.  I don't have a specific number that is large enough, I just
> know from experience that 2000 is way too small.  For example, running
> cyclictest several times with the same values on my laptop gives values
> that are not consistent:

The Avg figures look almost stable IMO. Are you speaking about the Max
value for the inconsistency ?

>
>    $ sudo ./cyclictest -t -q -e 10000000 -i 500 -d 150 -l 2000
>    # /dev/cpu_dma_latency set to 10000000us
>    T: 0 ( 9703) P: 0 I:500 C:   2000 Min:      2 Act:   90 Avg:   77 Max:     
> 243
>    T: 1 ( 9704) P: 0 I:650 C:   1557 Min:      2 Act:   58 Avg:   68 Max:     
> 226
>    T: 2 ( 9705) P: 0 I:800 C:   1264 Min:      2 Act:   54 Avg:   81 Max:    
> 1017
>    T: 3 ( 9706) P: 0 I:950 C:   1065 Min:      2 Act:   11 Avg:   80 Max:     
> 260
>
>    $ sudo ./cyclictest -t -q -e 10000000 -i 500 -d 150 -l 2000
>    # /dev/cpu_dma_latency set to 10000000us
>    T: 0 ( 9709) P: 0 I:500 C:   2000 Min:      2 Act:   45 Avg:   74 Max:     
> 390
>    T: 1 ( 9710) P: 0 I:650 C:   1554 Min:      2 Act:   82 Avg:   61 Max:     
> 810
>    T: 2 ( 9711) P: 0 I:800 C:   1263 Min:      2 Act:   83 Avg:   74 Max:     
> 287
>    T: 3 ( 9712) P: 0 I:950 C:   1064 Min:      2 Act:  103 Avg:   79 Max:     
> 551
>
>    $ sudo ./cyclictest -t -q -e 10000000 -i 500 -d 150 -l 2000
>    # /dev/cpu_dma_latency set to 10000000us
>    T: 0 ( 9716) P: 0 I:500 C:   2000 Min:      2 Act:   82 Avg:   72 Max:     
> 252
>    T: 1 ( 9717) P: 0 I:650 C:   1556 Min:      2 Act:  115 Avg:   77 Max:     
> 354
>    T: 2 ( 9718) P: 0 I:800 C:   1264 Min:      2 Act:   59 Avg:   78 Max:    
> 1143
>    T: 3 ( 9719) P: 0 I:950 C:   1065 Min:      2 Act:  104 Avg:   70 Max:     
> 238
>
>    $ sudo ./cyclictest -t -q -e 10000000 -i 500 -d 150 -l 2000
>    # /dev/cpu_dma_latency set to 10000000us
>    T: 0 ( 9722) P: 0 I:500 C:   2000 Min:      2 Act:   82 Avg:   68 Max:     
> 213
>    T: 1 ( 9723) P: 0 I:650 C:   1555 Min:      2 Act:   65 Avg:   65 Max:    
> 1279
>    T: 2 ( 9724) P: 0 I:800 C:   1264 Min:      2 Act:   91 Avg:   69 Max:     
> 244
>    T: 3 ( 9725) P: 0 I:950 C:   1065 Min:      2 Act:   58 Avg:   76 Max:     
> 242
>
>
>> >
>> > cyclictest could be a good starting point but we need to improve it to
>> > allow threads of different loads, possibly starting multiple processes
>> > (can be done with a script), randomly varying load threads. These
>> > parameters should be loaded from a file so that we can have multiple
>> > configurations (per SoC and per use-case). But the big risk is that we
>> > try to optimise the scheduler for something which is not realistic.
>>
>> The goal of this simple bench is to measure the wake up latency and the 
>> reachable value of the scheduler on a platform but not to emulate a "real" 
>> use case. In the same way than sched-pipe tests a specific behavior of the 
>> scheduler, this bench tests the wake up latency of a system.
>>
>> Starting multi processes and adding some loads can also be useful but the 
>> target will be a bit different from wake up latency. I have one concern with 
>> randomness because it prevents from having repeatable and comparable tests 
>> and results.
>>
>> I agree that we have to test "real" use cases but it doesn't prevent from 
>> testing the limit of a characteristic on a system
>>
>> >
>> >
>> > We are working on describing some basic scenarios (plain English for
>> > now) and one of them could be video playing with threads for audio and
>> > video decoding with random change in the workload.
>> >
>> > So I think the first step should be a set of tools/scripts to analyse
>> > the scheduler behaviour, both in terms of latency and power, and these
>> > can use perf sched. We can then run some real life scenarios (e.g.
>> > Android video playback) and build a benchmark that matches such
>> > behaviour as close as possible. We can probably use (or improve) perf
>> > sched replay to also simulate such workload (we may need additional
>> > features like thread dependencies).
>> >
>> >> The figures below give the average wakeup latency and power
>> >> consumption for default scheduler behavior, packing tasks at cluster
>> >> level and packing tasks at core level. We can see both wakeup latency
>> >> and power consumption variation. The detailed result is not a simple
>> >> single value which makes comparison not so easy but the average of all
>> >> measurements should give us a usable “score”.
>> >
>> > How did you assess the power/energy?
>>
>> I have use the embedded joule meter of the tc2.
>>
>> >
>> > Thanks.
>> >
>> > --
>> > Catalin
>>
>>             |  Default average results                  |  Cluster Packing 
>> average results          |  Core Packing average results
>>             |  Latency     stddev  A7 energy A15 energy |  Latency     
>> stddev  A7 energy A15 energy |  Latency     stddev  A7 energy A15 energy
>>             |     (us)                   (J)        (J) |     (us)           
>>         (J)        (J) |     (us)                   (J)        (J)
>>             |      879                794890    2364175 |      416           
>>      879688      12750 |      189                897452      30052
>>
>>  Cyclictest |  Default                                  |  Packing at 
>> Cluster level                 |  Packing at Core level
>>    Interval |  Latency     stddev  A7 energy A15 energy |  Latency     
>> stddev  A7 energy A15 energy |  Latency     stddev  A7 energy A15 energy
>>        (us) |     (us)                   (J)        (J) |     (us)           
>>         (J)        (J) |     (us)                   (J)        (J)
>>         500         24          1    1147477    2479576         21          
>> 1    1136768      11693         22          1    1126062      30138
>>         700         22          1    1136084    3058419         21          
>> 0    1125280      11761         21          1    1109950      23503
>
> < snip >
>
> Some questions about what these metrics are:
>
> The cyclictest data is reported per thread.  How did you combine the per 
> thread data
> to get a single latency and stddev value?
>
> Is "Latency" the average latency?

Yes. I have described below the procedure i have followed to get my results:

I run the same test (same parameters) several times ( i have tried
between 5 and 10 runs and the results were similar).
For each run, i compute the average of per thread average figure and i
compute the stddev between per thread results.
The results that i sent is an average of all runs with the same parameters.

>
> stddev is not reported by cyclictest.  How did you create this value?  Did you
> use the "-v" cyclictest option to report detailed data, then calculate stddev 
> from
> the detailed data?

No i haven't used the -v because it generates too much spurious wake
up that makes the results irrelevant

Vincent
>
> Thanks,
>
> -Frank
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to