Hi Venkat,

On 5/26/2026 11:14 AM, Srikar Dronamraju wrote:
* Chen, Yu C <[email protected]> [2026-05-25 23:35:45]:

Hi Venkat,

On 5/25/2026 10:07 PM, Venkat Rao Bagalkote wrote:
Greetings!!!

I am seeing an early boot kernel panic due to NULL pointer dereference
on a POWER9 (pSeries) system when testing linux-next (next-20260522).

It seems that cpumask_first(llc_mask(i)) is accessing
NULL cpu_coregroup_mask():

has_coregroup_support() is false, thus cpu_coregroup_map
is never allocated in smp_prepare_cpus().
This machine is a "shared system" VM. We should probably
let the LLC id generation fall back to using L2 id if
cpu_coregroup_mask is unavailable (which restores the
behavior before this patch). I'm wondering if the following
change would help(need IBM friends' help on this):

Power9 and below systems, dont have coregroup.
Its not because of shared LPAR. But its true for dedicated LPARs too.
Only Power10 and above systems have hemisphere where we add MC/coregroup
support.


OK, thanks for the correction. Are you saying coregroup_enabled is false
on Power9 and older hardware, and set to true on Power10? Power10 has a
corresponding device-tree property, which is parsed to enable hemisphere
support in find_possible_nodes(). This is why has_coregroup_support()
returns true for Power10.


diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 3467f86fd78f..cf6c2e4190ab 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1042,11 +1042,6 @@ static const struct cpumask
*tl_smallcore_smt_mask(struct sched_domain_topology_
  }
  #endif

-struct cpumask *cpu_coregroup_mask(int cpu)
-{
-       return per_cpu(cpu_coregroup_map, cpu);
-}
-
  static bool has_coregroup_support(void)
  {
         /* Coregroup identification not available on shared systems */
@@ -1056,6 +1051,14 @@ static bool has_coregroup_support(void)
         return coregroup_enabled;
  }

+struct cpumask *cpu_coregroup_mask(int cpu)
+{
+       if (!has_coregroup_support())
+               return cpu_l2_cache_mask(cpu);
+
+       return per_cpu(cpu_coregroup_map, cpu);
+}
+

While this is a work-around for the problem in Power9
It will hurt Power10 and Power11 systems.
As has been alluded by Prateek, MC is not LLC on Power.

Could you please elaborate on the cache topology?
Specifically, could you clarify what the LLC is for Power9
and Power10 respectively? Is it always the L2 cache?

I have checked the IBM documentation available at:
https://hc32.hotchips.org/assets/program/conference/day1/HotChips2020_Server_Processors_IBM_Starke_POWER10_v33.pdf
According to the document, a hemisphere corresponds to a 64MB
L3 cache shared by 8 cores. Since the MC domain spans a single
hemisphere, I wonder why the SD_SHARE_LLC flag is not enabled
for the MC domain?

So by using llc_mask as cpu_coregroup_mask() we run the trouble of assuming
MC to be similar to LLC. So it will impact Power 10/11 Systems.

In commit b5ea300a17e3 sched/cache: Make LLC id continuous, we define
#define llc_mask(cpu) cpu_coregroup_mask(cpu)

defining it llc_mask to cpu_coregroup_mask means MC should be LLC.
This is not true for some architectures atleast on Power.


OK.

So shouldn't it be using
#define llc_mask(cpu) per_cpu(sd_llc, cpu)

This should work for systems where LLC is sub-coregroup, coregroup (or super
coregroup: Lets say some archs want LLC at PKG and cluster at coregroup).

if we do that, I dont think we even need the else case where we say
#define llc_mask(cpu) cpumask_of(cpu)


I suppose you are referring to
sched_domain_span(per_cpu(sd_llc, cpu)).

Indeed, deriving the LLC from the SD_SHARE_LLC level offers
better scalability. However, this approach would involve scheduler
domains, which can be truncated by cpuset partitions - a scenario we
prefer to avoid.

thanks,
Chenyu

Reply via email to