[Public]

Hi,

> -----Original Message-----
> From: Jan Beulich <jbeul...@suse.com>
> Sent: Tuesday, March 25, 2025 6:49 PM
> To: Penny, Zheng <penny.zh...@amd.com>
> Cc: Huang, Ray <ray.hu...@amd.com>; Andrew Cooper
> <andrew.coop...@citrix.com>; Anthony PERARD <anthony.per...@vates.tech>;
> Orzel, Michal <michal.or...@amd.com>; Julien Grall <jul...@xen.org>; Roger
> Pau Monné <roger....@citrix.com>; Stefano Stabellini <sstabell...@kernel.org>;
> xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v3 12/15] xen/x86: implement EPP support for the amd-cppc
> driver in active mode
>
> On 06.03.2025 09:39, Penny Zheng wrote:
> > amd-cppc has 2 operation modes: autonomous (active) mode,
> > non-autonomous (passive) mode.
> > In active mode, platform ignores the requestd done in the Desired
> > Performance Target register and takes into account only the values set
> > to the minimum, maximum and energy performance preference(EPP)
> > registers.
> > The EPP is used in the CCLK DPM controller to drive the frequency that
> > a core is going to operate during short periods of activity.
> > The SOC EPP targets are configured on a scale from 0 to 255 where 0
> > represents maximum performance and 255 represents maximum efficiency.
>
> So this is the other way around from "perf" values, where aiui 0xff is 
> "highest"?
>

Yes, it is not the perf value. It is an arbitrary value on a scale from 0 to 255

> > @@ -261,7 +276,20 @@ static int cf_check amd_cppc_cpufreq_target(struct
> cpufreq_policy *policy,
> >          return res;
> >
> >      return amd_cppc_write_request(policy->cpu, data-
> >caps.lowest_nonlinear_perf,
> > -                                  des_perf, data->caps.highest_perf);
> > +                                  des_perf, data->caps.highest_perf,
> > +                                  /* Pre-defined BIOS value for passive 
> > mode */
> > +                                  per_cpu(epp_init, policy->cpu)); }
> > +
> > +static int read_epp_init(void)
> > +{
> > +    uint64_t val;
> > +
> > +    if ( rdmsr_safe(MSR_AMD_CPPC_REQ, val) )
> > +        return -EINVAL;
>
> I'm unconvinced of using rdmsr_safe() everywhere (i.e. this also goes for 
> earlier
> patches). Unless you can give a halfway reasonable scenario under which by the
> time we get here there's still a chance that the MSR isn't implemented in the 
> next
> lower layer (hardware or another hypervisor, just to explain what's meant, 
> without
> me assuming that the driver should come into play in the first place when we 
> run
> virtualized ourselves).
>

Correct me if I understand wrongly, we are concerning that the driver may not 
always
have the privilege to directly access the MSR in all scenarios, so rdmsr_safe 
with exception
handling isn't always suitable. Then maybe I shall switch them all into 
rdmsrl() ?

> Furthermore you call this function unconditionally, i.e. if there was a 
> chance for the
> MSR read to fail, CPU init would needlessly fail when in passive mode.
>

The reason why I also run read_epp_init() for passive mode is to avoid setting 
epp with zero value
for MSR_AMD_CPPC_REQ in passive mode. I want to give it pre-defined BIOS value 
in passive mode.
If we wrap read_epp_init() with active mode check, maybe we shall add extra 
read before setting request register MSR_AMD_CPPC_REQ,
introducing MSR_AMD_CPPC_EPP_MASK to reserve original value for epp in passive 
mode, or any better suggestion?

> > +    {
> > +        /* Force the epp value to be zero for performance policy */
> > +        epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE;
> > +        min_perf = max_perf;
> > +    }
> > +    else if ( policy->policy == CPUFREQ_POLICY_POWERSAVE )
> > +        /* Force the epp value to be 0xff for powersave policy */
> > +        /*
> > +         * If set max_perf = min_perf = lowest_perf, we are putting
> > +         * cpu cores in idle.
> > +         */
>
> Nit: Such two successive comments want combining. (Same near the top of the
> function, as I notice only now.)
>
> Furthermore I'm in trouble with interpreting this comment: To me "lowest"
> doesn't mean "doing nothing" but "doing things as efficiently in terms of 
> power use
> as possible". IOW that's not idle. Yet the comment reads as if it was meant 
> to be an
> explanation of why we can't set max_perf from min_perf here. That is, not 
> matter
> what's meant to be said, I think this needs re- wording (and possibly using
> subjunctive mood).
>

How about:
The lowest non-linear perf is equivalent as P2 frequency. Reducing performance 
below this
point does not lead to total energy savings for a given computation (although 
it reduces momentary power).
So we are not suggesting to set max_perf smaller than lowest non-linear perf, 
or even the lowest perf.

> Jan

Reply via email to