[Public]

Hi,

Sorry for the confidential header before. I finally learned how to remove 
them...

> -----Original Message-----
> From: Jan Beulich <jbeul...@suse.com>
> Sent: Wednesday, February 12, 2025 12:46 AM
> To: Penny, Zheng <penny.zh...@amd.com>
> Cc: Huang, Ray <ray.hu...@amd.com>; Andryuk, Jason
> <jason.andr...@amd.com>; Andrew Cooper <andrew.coop...@citrix.com>;
> Roger Pau Monné <roger....@citrix.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v2 05/11] xen/x86: introduce a new amd cppc driver for
> cpufreq scaling
>
> On 06.02.2025 09:32, Penny Zheng wrote:
> > --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> > +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> > +}
> > +
> > +static int cf_check amd_cppc_write_request(int cpu, uint8_t min_perf,
> > +                                           uint8_t des_perf, uint8_t
> > +max_perf) {
> > +    struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
> > +    uint64_t prev = data->req.raw;
> > +
> > +    data->req.min_perf = min_perf;
> > +    data->req.max_perf = max_perf;
> > +    data->req.des_perf = des_perf;
> > +
> > +    if ( prev == data->req.raw )
> > +        return 0;
> > +
> > +    on_selected_cpus(cpumask_of(cpu), amd_cppc_write_request_msrs,
> > + data, 1);
> > +
> > +    return data->err;
> > +}
>
> ... in this function. Then the field would be written to (and the cacheline 
> change
> ownership) only in the error case.
>
> As to the function's parameters - why's there a plain int?
>

Are you asking why "int cpu" here?

> > +    /* Package level MSR */
> > +    if ( rdmsr_safe(MSR_AMD_CPPC_ENABLE, val) )
> > +    {
> > +        amd_cppc_err(policy->cpu, "rdmsr_safe(MSR_AMD_CPPC_ENABLE)\n");
> > +        goto err;
> > +    }
> > +
> > +    /*
> > +     * Only when Enable bit is on, the hardware will calculate the 
> > processor’s
> > +     * performance capabilities and initialize the performance level 
> > fields in
> > +     * the CPPC capability registers.
> > +     */
> > +    if ( !(val & AMD_CPPC_ENABLE) )
> > +    {
> > +        val |= AMD_CPPC_ENABLE;
> > +        if ( wrmsr_safe(MSR_AMD_CPPC_ENABLE, val) )
> > +        {
> > +            amd_cppc_err(policy->cpu,
> "wrmsr_safe(MSR_AMD_CPPC_ENABLE, %lx)\n", val);
> > +            goto err;
> > +        }
> > +    }
> > +
> > +    if ( rdmsr_safe(MSR_AMD_CPPC_CAP1, data->caps.raw) )
> > +    {
> > +        amd_cppc_err(policy->cpu, "rdmsr_safe(MSR_AMD_CPPC_CAP1)\n");
> > +        goto err;
> > +    }
> > +
> > +    if ( data->caps.highest_perf == 0 || data->caps.lowest_perf == 0 ||
> > +         data->caps.nominal_perf == 0 || data->caps.lowest_nonlinear_perf 
> > == 0 )
> > +    {
> > +        amd_cppc_err(policy->cpu,
> > +                     "Platform malfunction, read CPPC highest_perf: %u,
> lowest_perf: %u, nominal_perf: %u, lowest_nonlinear_perf: %u zero value\n",
> > +                     data->caps.highest_perf, data->caps.lowest_perf,
> > +                     data->caps.nominal_perf, 
> > data->caps.lowest_nonlinear_perf);
> > +        goto err;
> > +    }
> > +
> > +    data->err = amd_get_min_freq(data, &min_freq);
> > +    if ( data->err )
> > +        return;
> > +
> > +    data->err = amd_get_nominal_freq(data, &nominal_freq);
> > +    if ( data->err )
> > +        return;
> > +
> > +    data->err = amd_get_max_freq(data, &max_freq);
> > +    if ( data->err )
> > +        return;
> > +
> > +    if ( min_freq > max_freq )
> > +    {
> > +        amd_cppc_err(policy->cpu, "min_freq(%u) or max_freq(%u) value is
> incorrect\n",
> > +                     min_freq, max_freq);
> > +        goto err;
> > +    }
>
> And nominal freq not being between the two is okay?
>
> > +    policy->min = min_freq;
> > +    policy->max = max_freq;
> > +
> > +    policy->cpuinfo.min_freq = min_freq;
> > +    policy->cpuinfo.max_freq = max_freq;
> > +    policy->cpuinfo.perf_freq = nominal_freq;
> > +    policy->cur = nominal_freq;
>
> How do you know this is correct? Or does it simply not matter at this point?
>

I need policy->cur to be set for correctly using ondemand governor.
As we don't have any MSR register to reflect runtime perf/freq value under CPPC 
control,
I will suggest to use APERF/MPERF average frequency as current freq here.

> > +    /* Initial processor data capability frequencies */
> > +    data->min_freq = min_freq;
> > +    data->nominal_freq = nominal_freq;
> > +    data->max_freq = max_freq;
>
> Is this data duplication actually needed? Can't data, if necessary, simply 
> have a
> back pointer to the policy for the CPU?
>
> Actually, aren't two of the three values already accessible through the 
> cppc_data
> pointer hanging off of data? Which in turn makes me wonder why you go through
> the effort of calculating those values.
>

cppc_data->lowest/nominal frequency comes from ACPI _CPC entry, and they are
optional fields. So we need to calculate them when unavailable.
We need them to set policy->min/max, which are referred in ondemand governor.
But you're right, at least in this patch, we are not using 
data->min/max/nominal anywhere

> > + err:
> > +    data->err = -EINVAL;
> > +}
>
> At this point you may have set the enable bit already, which you can't undo.
> What effect will this have on the system when the error path is taken here?
> Especially if we then try to fall back to another driver?
>

On current code logic, when the error path is taken, amd-cppc cpufreq driver 
fails
to initialize. And we will also not fall back to another driver.
As we could register *only one* cpufreq driver. If amd-cppc is registered 
properly
before, then legacy P-states will not get registered.
In amd-cppc registration code:
```
+int __init amd_cppc_register_driver(void)
+{
+    if ( !cpu_has_cppc )
+        return -ENODEV;
+
+    return cpufreq_register_driver(&amd_cppc_cpufreq_driver);
+}
```
CPPC feature MSR gets checked before the registration, which is to check 
whether the
hardware has CPPC msr support.

> Jan

Many thanks,
Penny

Reply via email to