> -----Original Message-----
> From: Tim Chen [mailto:tim.c.c...@linux.intel.com]
> Sent: Friday, January 8, 2021 12:17 PM
> To: Song Bao Hua (Barry Song) <song.bao....@hisilicon.com>;
> valentin.schnei...@arm.com; catalin.mari...@arm.com; w...@kernel.org;
> r...@rjwysocki.net; vincent.guit...@linaro.org; l...@kernel.org;
> gre...@linuxfoundation.org; Jonathan Cameron <jonathan.came...@huawei.com>;
> mi...@redhat.com; pet...@infradead.org; juri.le...@redhat.com;
> dietmar.eggem...@arm.com; rost...@goodmis.org; bseg...@google.com;
> mgor...@suse.de; mark.rutl...@arm.com; sudeep.ho...@arm.com;
> aubrey...@linux.intel.com
> Cc: linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org;
> linux-a...@vger.kernel.org; linux...@openeuler.org; xuwei (O)
> <xuw...@huawei.com>; Zengtao (B) <prime.z...@hisilicon.com>; tiantao (H)
> <tiant...@hisilicon.com>
> Subject: Re: [RFC PATCH v3 0/2] scheduler: expose the topology of clusters and
> add cluster scheduler
> 
> 
> 
> On 1/6/21 12:30 AM, Barry Song wrote:
> > ARM64 server chip Kunpeng 920 has 6 clusters in each NUMA node, and each
> > cluster has 4 cpus. All clusters share L3 cache data while each cluster
> > has local L3 tag. On the other hand, each cluster will share some
> > internal system bus. This means cache is much more affine inside one cluster
> > than across clusters.
> >
> >     +-----------------------------------+                          
> > +---------+
> >     |  +------+    +------+            +---------------------------+        
> >  |
> >     |  | CPU0 |    | cpu1 |             |    +-----------+         |        
> >  |
> >     |  +------+    +------+             |    |           |         |        
> >  |
> >     |                                   +----+    L3     |         |        
> >  |
> >     |  +------+    +------+   cluster   |    |    tag    |         |        
> >  |
> >     |  | CPU2 |    | CPU3 |             |    |           |         |        
> >  |
> >     |  +------+    +------+             |    +-----------+         |        
> >  |
> >     |                                   |                          |        
> >  |
> >     +-----------------------------------+                          |        
> >  |
> >     +-----------------------------------+                          |        
> >  |
> >     |  +------+    +------+             +--------------------------+        
> >  |
> >     |  |      |    |      |             |    +-----------+         |        
> >  |
> >     |  +------+    +------+             |    |           |         |        
> >  |
> >     |                                   |    |    L3     |         |        
> >  |
> >     |  +------+    +------+             +----+    tag    |         |        
> >  |
> >     |  |      |    |      |             |    |           |         |        
> >  |
> >     |  +------+    +------+             |    +-----------+         |        
> >  |
> >     |                                   |                          |        
> >  |
> >     +-----------------------------------+                          |   L3   
> >  |
> >                                                                    |   data 
> >  |
> >     +-----------------------------------+                          |        
> >  |
> >     |  +------+    +------+             |    +-----------+         |        
> >  |
> >     |  |      |    |      |             |    |           |         |        
> >  |
> >     |  +------+    +------+             +----+    L3     |         |        
> >  |
> >     |                                   |    |    tag    |         |        
> >  |
> >     |  +------+    +------+             |    |           |         |        
> >  |
> >     |  |      |    |      |            ++    +-----------+         |        
> >  |
> >     |  +------+    +------+            |---------------------------+        
> >  |
> >     +-----------------------------------|                          |        
> >  |
> >     +-----------------------------------|                          |        
> >  |
> >     |  +------+    +------+            +---------------------------+        
> >  |
> >     |  |      |    |      |             |    +-----------+         |        
> >  |
> >     |  +------+    +------+             |    |           |         |        
> >  |
> >     |                                   +----+    L3     |         |        
> >  |
> >     |  +------+    +------+             |    |    tag    |         |        
> >  |
> >     |  |      |    |      |             |    |           |         |        
> >  |
> >     |  +------+    +------+             |    +-----------+         |        
> >  |
> >     |                                   |                          |        
> >  |
> >     +-----------------------------------+                          |        
> >  |
> >     +-----------------------------------+                          |        
> >  |
> >     |  +------+    +------+             +--------------------------+        
> >  |
> >     |  |      |    |      |             |   +-----------+          |        
> >  |
> >     |  +------+    +------+             |   |           |          |        
> >  |
> >
> >
> 
> There is a similar need for clustering in x86.  Some x86 cores could share L2
> caches that
> is similar to the cluster in Kupeng 920 (e.g. on Jacobsville there are 6 
> clusters
> of 4 Atom cores, each cluster sharing a separate L2, and 24 cores sharing L3).
> Having a sched domain at the L2 cluster helps spread load among
> L2 domains.  This will reduce L2 cache contention and help with
> performance for low to moderate load scenarios.
> 
> The cluster detection mechanism will need
> to be based on L2 cache sharing in this case.  I suggest making the
> cluster detection to be CPU architecture dependent so both ARM64 and x86 use
> cases
> can be accommodated.
> 
> Attached below are two RFC patches for creating x86 L2
> cache sched domain, sans the idle cpu selection on wake up code.  It is
> similar enough in concept to Barry's patch that we should have a
> single patchset that accommodates both use cases.

Hi Tim, Agreed on this.
hopefully the RFC v4 I am preparing will cover your case.

> 
> Thanks.
> 
> Tim

Thanks
Barry

Reply via email to