On 3 November 2011 15:36, Amit Kucheria <amit.kuche...@linaro.org> wrote: > Vincent, > > What is the status of this after talks with the scheduler maintainers > in Prague last week? Have you shared this with Peter Z et al. >
The last week discussion was mainly focused on a new load balancer which will add a time weighted task load monitoring. This should help us to gather light loaded tasks on 1 core as we can differentiate short running tasks and long ones. I'm also preparing the description of typical ARM cpu topologies to Peter and Paul who are interested by such information. I have in mind dual/quad core topology and Big.Little one which are embedded systems oriented but there might be other ones (server configuration for example). Feel free to point it out to me. We have also discussed about the reason why cpu hotplug was used by ARM guys to do power saving and I think that they have understood our point and will study how to match our requirements. Peter and Paul prepare a draft version of new load balance. Regarding this patch series, some patches might become useless with new load balance but we need to test it before to conclude. We still have to manage different cpu capacity. This series is also useful to figure out some issues, which appear when using only one core when both are available, but are not directly linked to the scheduler. Vincent > Regards, > Amit > > On Fri, Oct 21, 2011 at 12:52 PM, Vincent Guittot > <vincent.guit...@linaro.org> wrote: >> The sched_mc feature has been originally designed to improve power >> consumption of multi-package system and several architecture functions >> are available to tune the topology and the scheduler's parameters when >> scheduler rebuilds the sched_domain hierarchy (change the >> sched_mc_power_savings level). This patches series is a trial to >> improve the power consumption of dual and quad cortex-A9 when the >> sched_mc_power_savings is set to 2. The following patch's policy is to >> accept up to 4 threads (can be configured) in the run queue of a core >> before starting to load balance if cpu runs at low frequencies but to >> accept only 1 thread for high frequencies, which is the normal >> behaviour. The goal is to use only one cpu in light load situation and >> both cpu in heavy load situation >> >> Patches [1-3] modify the ARM cpu topology according to >> sched_mc_power_savings value and Cortex id >> Patch [4] enables ARCH_POWER feature of the scheduler >> Patch [5] adds ARCH_POWER function for ARM platform >> Patches [6-7] modify the cpu_power of CA-9 according to >> sched_mc_power_savings' level and current frequency. The main goal is >> to increase the capacity of a core when using low cpu frequency in >> order to pull tasks on this core. Note that this behaviour is not >> really advised but it can be seen as an intermediate step between the >> use of cpu hotplug (which is not a power saving feature) and a new >> load balancer which will take into account low load situation on dual >> core. >> Patch [8] ensures that cpu0 is used in priority when only one CPU is running >> Patch [9] adds some debugfs interface for test purpose >> Patch [10] ensures that the cpu_power will be update periodically >> Patch [11] fixes an issue around the trigger of ilb. >> >> TODO list: >> -remove useless start of ilb when the core has capacity. >> -add a method (DT, sysfs, ...) to set threshold for using 1 or 2 cpus >> for dual CA-9 >> -irq balancing >> >> The tests hereafter have been done on a u8500 with kernel linaro-3.1. >> They check that there is no obvious lost of performance when >> sched_mc=2. >> >> sysbench --test=cpu --num-threads=12 --max-time=20 run >> Test execution summary: sched_mc=0 sched_mc=2 cpu hotplug >> total number of events: 665 664 >> 336 >> per-request statistics: >> min: 92.68ms 70.53ms >> 618.89ms >> avg: 361.75ms 361.38ms >> 725.29ms >> max: 421.08ms 420.73ms 840.74ms >> approx. 95 percentile: 402.28ms 390.53ms 760.17ms >> >> sysbench --test=threads --thread-locks=9 --num-threads=12 --max-time=20 run >> Test execution summary: sched_mc=0 sched_mc=2 cpu hotplug >> total number of events: 10000 10000 3129 >> per-request statistics: >> min: 1.62ms 1.70ms >> 13.16ms >> avg: 22.23ms 21.87ms >> 76.77ms >> max: 153.52ms 133.99ms 173.82ms >> approx. 95 percentile: 54.12ms 52.65ms 136.32ms >> >> sysbench --test=threads --thread-locks=2 --num-threads=3 --max-time=20 run >> Test execution summary: sched_mc=0 sched_mc=2 cpu hotplug >> total number of events: 10000 10000 10000 >> per-request statistics: >> min: 1.38ms 1.38ms >> 5.70ms >> avg: 4.67ms 5.37ms >> 11.85ms >> max: 36.84ms 32.42ms >> 32.58ms >> approx. 95 percentile: 14.34ms 12.89ms 21.30ms >> >> cyclictest -q -t -D 20 >> Only one cpu is used during this test when sched_mc=2 whereas both cpu >> are used when sched_mc=0 >> Test execution summary: sched_mc=0 sched_mc=2 cpu hotplug >> >> Avg, Max: 15, 434 19, 2145 17, >> 3556 >> Avg, Max: 14, 104 19, 1686 17, >> 3593 >> >> Regards, >> Vincent >> >> _______________________________________________ >> linaro-dev mailing list >> linaro-dev@lists.linaro.org >> http://lists.linaro.org/mailman/listinfo/linaro-dev >> > _______________________________________________ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev