On 2024/8/16 23:55, Dietmar Eggemann wrote:
> On 06/08/2024 10:53, Yicong Yang wrote:
>> From: Yicong Yang <yangyic...@hisilicon.com>
>>
>> For ACPI we'll build the topology from PPTT and we cannot directly
>> get the SMT number of each core. Instead using a temporary xarray
>> to record the SMT number of each core when building the topology
>> and we can know the largest SMT number in the system. Then we can
>> enable the support of SMT control.
>>
>> Signed-off-by: Yicong Yang <yangyic...@hisilicon.com>
>> ---
>>  arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
>>  1 file changed, 24 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 1a2c72f3e7f8..f72e1e55b05e 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -15,8 +15,10 @@
>>  #include <linux/arch_topology.h>
>>  #include <linux/cacheinfo.h>
>>  #include <linux/cpufreq.h>
>> +#include <linux/cpu_smt.h>
>>  #include <linux/init.h>
>>  #include <linux/percpu.h>
>> +#include <linux/xarray.h>
>>  
>>  #include <asm/cpu.h>
>>  #include <asm/cputype.h>
>> @@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu)
>>   */
>>  int __init parse_acpi_topology(void)
>>  {
>> +    int thread_num, max_smt_thread_num = 1;
>> +    struct xarray core_threads;
>>      int cpu, topology_id;
>> +    void *entry;
>>  
>>      if (acpi_disabled)
>>              return 0;
>>  
>> +    xa_init(&core_threads);
>> +
>>      for_each_possible_cpu(cpu) {
>>              topology_id = find_acpi_cpu_topology(cpu, 0);
>>              if (topology_id < 0)
>> @@ -57,6 +64,20 @@ int __init parse_acpi_topology(void)
>>                      cpu_topology[cpu].thread_id = topology_id;
>>                      topology_id = find_acpi_cpu_topology(cpu, 1);
>>                      cpu_topology[cpu].core_id   = topology_id;
>> +
>> +                    entry = xa_load(&core_threads, topology_id);
>> +                    if (!entry) {
>> +                            xa_store(&core_threads, topology_id,
>> +                                     xa_mk_value(1), GFP_KERNEL);
>> +                    } else {
>> +                            thread_num = xa_to_value(entry);
>> +                            thread_num++;
>> +                            xa_store(&core_threads, topology_id,
>> +                                     xa_mk_value(thread_num), GFP_KERNEL);
>> +
>> +                            if (thread_num > max_smt_thread_num)
>> +                                    max_smt_thread_num = thread_num;
>> +                    }
> 
> So the xarray contains one element for each core_id with the information
> how often the core_id occurs? I assume you have to iterate over all
> possible CPUs since you don't know which logical CPUs belong to the same
> core_id.
> 

Each xarray element counts the thread number of a certain core id. so the logic 
is like below:
1. if the "core id" entry doesn't exists, then we're accessing this core for 
the 1st time. create
   one and make the thread number to 1
2. otherwise increment the thread number of "core id" this cpu belongs (PPTT 
already
   told us which core this CPU belongs to). Update the max_smt_thread_num if 
necessary.

Then we can know max_smt_thread_num by meanwhile iterating the PPTT table and
build the topology for all the possible CPUs.

Otherwise we need to do a second scan for the max thread number after built the
topology. This way is implemented in v1 and it's complained about the overhead 
on large
scale systems since we need to loop the CPUs twice.

>>              } else {
>>                      cpu_topology[cpu].thread_id  = -1;
>>                      cpu_topology[cpu].core_id    = topology_id;
>> @@ -67,6 +88,9 @@ int __init parse_acpi_topology(void)
>>              cpu_topology[cpu].package_id = topology_id;
>>      }
>>  
>> +    cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>> +
>> +    xa_destroy(&core_threads);
>>      return 0;
>>  }
>>  #endif
> 
> Tested on ThunderX2:
> 
> $ cat /proc/schedstat | head -6 | tail -4 | awk '{ print $1, $2 }'
> cpu0 0
> domain0 
> 00000000,00000000,00000000,00000000,00000001,00000001,00000001,00000001
>                                                    ^        ^        ^        
> ^
> domain1 
> 00000000,00000000,00000000,00000000,ffffffff,ffffffff,ffffffff,ffffffff
> domain2 
> ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
> 
> detecting 'max_smt_thread_num = 4' correctly.
> 

Thanks for the testing. ok for a tag?

Thanks.


Reply via email to