Re: [RFC PATCH v2 3/6] sched: pack small tasks

Alex Shi Thu, 13 Dec 2012 17:47:37 -0800

On 12/13/2012 11:48 PM, Vincent Guittot wrote:
> On 13 December 2012 15:53, Vincent Guittot <vincent.guit...@linaro.org> wrote:
>> On 13 December 2012 15:25, Alex Shi <alex....@intel.com> wrote:
>>> On 12/13/2012 06:11 PM, Vincent Guittot wrote:
>>>> On 13 December 2012 03:17, Alex Shi <alex....@intel.com> wrote:
>>>>> On 12/12/2012 09:31 PM, Vincent Guittot wrote:
>>>>>> During the creation of sched_domain, we define a pack buddy CPU for each 
>>>>>> CPU
>>>>>> when one is available. We want to pack at all levels where a group of 
>>>>>> CPU can
>>>>>> be power gated independently from others.
>>>>>> On a system that can't power gate a group of CPUs independently, the 
>>>>>> flag is
>>>>>> set at all sched_domain level and the buddy is set to -1. This is the 
>>>>>> default
>>>>>> behavior.
>>>>>> On a dual clusters / dual cores system which can power gate each core and
>>>>>> cluster independently, the buddy configuration will be :
>>>>>>
>>>>>>       | Cluster 0   | Cluster 1   |
>>>>>>       | CPU0 | CPU1 | CPU2 | CPU3 |
>>>>>> -----------------------------------
>>>>>> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
>>>>>>
>>>>>> Small tasks tend to slip out of the periodic load balance so the best 
>>>>>> place
>>>>>> to choose to migrate them is during their wake up. The decision is in 
>>>>>> O(1) as
>>>>>> we only check again one buddy CPU
>>>>>
>>>>> Just have a little worry about the scalability on a big machine, like on
>>>>> a 4 sockets NUMA machine * 8 cores * HT machine, the buddy cpu in whole
>>>>> system need care 64 LCPUs. and in your case cpu0 just care 4 LCPU. That
>>>>> is different on task distribution decision.
>>>>
>>>> The buddy CPU should probably not be the same for all 64 LCPU it
>>>> depends on where it's worth packing small tasks
>>>
>>> Do you have further ideas for buddy cpu on such example?
>>
>> yes, I have several ideas which were not really relevant for small
>> system but could be interesting for larger system
>>
>> We keep the same algorithm in a socket but we could either use another
>> LCPU in the targeted socket (conf0) or chain the socket (conf1)
>> instead of packing directly in one LCPU
>>
>> The scheme below tries to summaries the idea:
>>
>> Socket      | socket 0 | socket 1   | socket 2   | socket 3   |
>> LCPU        | 0 | 1-15 | 16 | 17-31 | 32 | 33-47 | 48 | 49-63 |
>> buddy conf0 | 0 | 0    | 1  | 16    | 2  | 32    | 3  | 48    |
>> buddy conf1 | 0 | 0    | 0  | 16    | 16 | 32    | 32 | 48    |
>> buddy conf2 | 0 | 0    | 16 | 16    | 32 | 32    | 48 | 48    |
>>
>> But, I don't know how this can interact with NUMA load balance and the
>> better might be to use conf3.
> 
> I mean conf2 not conf3


So, it has 4 levels 0/16/32/ for socket 3 and 0 level for socket 0, it
is unbalanced for different socket.

And the ground level has just one buddy for 16 LCPUs - 8 cores, that's
not a good design, consider my previous examples: if there are 4 or 8
tasks in one socket, you just has 2 choices: spread them into all cores,
or pack them into one LCPU. Actually, moving them just into 2 or 4 cores
maybe a better solution. but the design missed this.

Obviously, more and more cores is the trend on any kinds of CPU, the
buddy system seems hard to catch up this.



_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC PATCH v2 3/6] sched: pack small tasks

Reply via email to