On 12/13/2012 11:48 PM, Vincent Guittot wrote: > On 13 December 2012 15:53, Vincent Guittot <vincent.guit...@linaro.org> wrote: >> On 13 December 2012 15:25, Alex Shi <alex....@intel.com> wrote: >>> On 12/13/2012 06:11 PM, Vincent Guittot wrote: >>>> On 13 December 2012 03:17, Alex Shi <alex....@intel.com> wrote: >>>>> On 12/12/2012 09:31 PM, Vincent Guittot wrote: >>>>>> During the creation of sched_domain, we define a pack buddy CPU for each >>>>>> CPU >>>>>> when one is available. We want to pack at all levels where a group of >>>>>> CPU can >>>>>> be power gated independently from others. >>>>>> On a system that can't power gate a group of CPUs independently, the >>>>>> flag is >>>>>> set at all sched_domain level and the buddy is set to -1. This is the >>>>>> default >>>>>> behavior. >>>>>> On a dual clusters / dual cores system which can power gate each core and >>>>>> cluster independently, the buddy configuration will be : >>>>>> >>>>>> | Cluster 0 | Cluster 1 | >>>>>> | CPU0 | CPU1 | CPU2 | CPU3 | >>>>>> ----------------------------------- >>>>>> buddy | CPU0 | CPU0 | CPU0 | CPU2 | >>>>>> >>>>>> Small tasks tend to slip out of the periodic load balance so the best >>>>>> place >>>>>> to choose to migrate them is during their wake up. The decision is in >>>>>> O(1) as >>>>>> we only check again one buddy CPU >>>>> >>>>> Just have a little worry about the scalability on a big machine, like on >>>>> a 4 sockets NUMA machine * 8 cores * HT machine, the buddy cpu in whole >>>>> system need care 64 LCPUs. and in your case cpu0 just care 4 LCPU. That >>>>> is different on task distribution decision. >>>> >>>> The buddy CPU should probably not be the same for all 64 LCPU it >>>> depends on where it's worth packing small tasks >>> >>> Do you have further ideas for buddy cpu on such example? >> >> yes, I have several ideas which were not really relevant for small >> system but could be interesting for larger system >> >> We keep the same algorithm in a socket but we could either use another >> LCPU in the targeted socket (conf0) or chain the socket (conf1) >> instead of packing directly in one LCPU >> >> The scheme below tries to summaries the idea: >> >> Socket | socket 0 | socket 1 | socket 2 | socket 3 | >> LCPU | 0 | 1-15 | 16 | 17-31 | 32 | 33-47 | 48 | 49-63 | >> buddy conf0 | 0 | 0 | 1 | 16 | 2 | 32 | 3 | 48 | >> buddy conf1 | 0 | 0 | 0 | 16 | 16 | 32 | 32 | 48 | >> buddy conf2 | 0 | 0 | 16 | 16 | 32 | 32 | 48 | 48 | >> >> But, I don't know how this can interact with NUMA load balance and the >> better might be to use conf3. > > I mean conf2 not conf3
So, it has 4 levels 0/16/32/ for socket 3 and 0 level for socket 0, it is unbalanced for different socket. And the ground level has just one buddy for 16 LCPUs - 8 cores, that's not a good design, consider my previous examples: if there are 4 or 8 tasks in one socket, you just has 2 choices: spread them into all cores, or pack them into one LCPU. Actually, moving them just into 2 or 4 cores maybe a better solution. but the design missed this. Obviously, more and more cores is the trend on any kinds of CPU, the buddy system seems hard to catch up this. _______________________________________________ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev