[Public] Hi,
Sorry for the confidential header before. I finally learned how to remove them... > -----Original Message----- > From: Jan Beulich <jbeul...@suse.com> > Sent: Wednesday, February 12, 2025 12:46 AM > To: Penny, Zheng <penny.zh...@amd.com> > Cc: Huang, Ray <ray.hu...@amd.com>; Andryuk, Jason > <jason.andr...@amd.com>; Andrew Cooper <andrew.coop...@citrix.com>; > Roger Pau Monné <roger....@citrix.com>; xen-devel@lists.xenproject.org > Subject: Re: [PATCH v2 05/11] xen/x86: introduce a new amd cppc driver for > cpufreq scaling > > On 06.02.2025 09:32, Penny Zheng wrote: > > --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c > > +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c > > +} > > + > > +static int cf_check amd_cppc_write_request(int cpu, uint8_t min_perf, > > + uint8_t des_perf, uint8_t > > +max_perf) { > > + struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu); > > + uint64_t prev = data->req.raw; > > + > > + data->req.min_perf = min_perf; > > + data->req.max_perf = max_perf; > > + data->req.des_perf = des_perf; > > + > > + if ( prev == data->req.raw ) > > + return 0; > > + > > + on_selected_cpus(cpumask_of(cpu), amd_cppc_write_request_msrs, > > + data, 1); > > + > > + return data->err; > > +} > > ... in this function. Then the field would be written to (and the cacheline > change > ownership) only in the error case. > > As to the function's parameters - why's there a plain int? > Are you asking why "int cpu" here? > > + /* Package level MSR */ > > + if ( rdmsr_safe(MSR_AMD_CPPC_ENABLE, val) ) > > + { > > + amd_cppc_err(policy->cpu, "rdmsr_safe(MSR_AMD_CPPC_ENABLE)\n"); > > + goto err; > > + } > > + > > + /* > > + * Only when Enable bit is on, the hardware will calculate the > > processor’s > > + * performance capabilities and initialize the performance level > > fields in > > + * the CPPC capability registers. > > + */ > > + if ( !(val & AMD_CPPC_ENABLE) ) > > + { > > + val |= AMD_CPPC_ENABLE; > > + if ( wrmsr_safe(MSR_AMD_CPPC_ENABLE, val) ) > > + { > > + amd_cppc_err(policy->cpu, > "wrmsr_safe(MSR_AMD_CPPC_ENABLE, %lx)\n", val); > > + goto err; > > + } > > + } > > + > > + if ( rdmsr_safe(MSR_AMD_CPPC_CAP1, data->caps.raw) ) > > + { > > + amd_cppc_err(policy->cpu, "rdmsr_safe(MSR_AMD_CPPC_CAP1)\n"); > > + goto err; > > + } > > + > > + if ( data->caps.highest_perf == 0 || data->caps.lowest_perf == 0 || > > + data->caps.nominal_perf == 0 || data->caps.lowest_nonlinear_perf > > == 0 ) > > + { > > + amd_cppc_err(policy->cpu, > > + "Platform malfunction, read CPPC highest_perf: %u, > lowest_perf: %u, nominal_perf: %u, lowest_nonlinear_perf: %u zero value\n", > > + data->caps.highest_perf, data->caps.lowest_perf, > > + data->caps.nominal_perf, > > data->caps.lowest_nonlinear_perf); > > + goto err; > > + } > > + > > + data->err = amd_get_min_freq(data, &min_freq); > > + if ( data->err ) > > + return; > > + > > + data->err = amd_get_nominal_freq(data, &nominal_freq); > > + if ( data->err ) > > + return; > > + > > + data->err = amd_get_max_freq(data, &max_freq); > > + if ( data->err ) > > + return; > > + > > + if ( min_freq > max_freq ) > > + { > > + amd_cppc_err(policy->cpu, "min_freq(%u) or max_freq(%u) value is > incorrect\n", > > + min_freq, max_freq); > > + goto err; > > + } > > And nominal freq not being between the two is okay? > > > + policy->min = min_freq; > > + policy->max = max_freq; > > + > > + policy->cpuinfo.min_freq = min_freq; > > + policy->cpuinfo.max_freq = max_freq; > > + policy->cpuinfo.perf_freq = nominal_freq; > > + policy->cur = nominal_freq; > > How do you know this is correct? Or does it simply not matter at this point? > I need policy->cur to be set for correctly using ondemand governor. As we don't have any MSR register to reflect runtime perf/freq value under CPPC control, I will suggest to use APERF/MPERF average frequency as current freq here. > > + /* Initial processor data capability frequencies */ > > + data->min_freq = min_freq; > > + data->nominal_freq = nominal_freq; > > + data->max_freq = max_freq; > > Is this data duplication actually needed? Can't data, if necessary, simply > have a > back pointer to the policy for the CPU? > > Actually, aren't two of the three values already accessible through the > cppc_data > pointer hanging off of data? Which in turn makes me wonder why you go through > the effort of calculating those values. > cppc_data->lowest/nominal frequency comes from ACPI _CPC entry, and they are optional fields. So we need to calculate them when unavailable. We need them to set policy->min/max, which are referred in ondemand governor. But you're right, at least in this patch, we are not using data->min/max/nominal anywhere > > + err: > > + data->err = -EINVAL; > > +} > > At this point you may have set the enable bit already, which you can't undo. > What effect will this have on the system when the error path is taken here? > Especially if we then try to fall back to another driver? > On current code logic, when the error path is taken, amd-cppc cpufreq driver fails to initialize. And we will also not fall back to another driver. As we could register *only one* cpufreq driver. If amd-cppc is registered properly before, then legacy P-states will not get registered. In amd-cppc registration code: ``` +int __init amd_cppc_register_driver(void) +{ + if ( !cpu_has_cppc ) + return -ENODEV; + + return cpufreq_register_driver(&amd_cppc_cpufreq_driver); +} ``` CPPC feature MSR gets checked before the registration, which is to check whether the hardware has CPPC msr support. > Jan Many thanks, Penny