Hello Michael, On Thu, Apr 29, 2021 at 07:56:25PM +1000, Michael Ellerman wrote: > "Gautham R. Shenoy" <e...@linux.vnet.ibm.com> writes: > > From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com> > > > > Commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for > > CEDE(0)") sets the exit latency of CEDE(0) based on the latency values > > of the Extended CEDE states advertised by the platform > > > > On POWER9 LPARs, the firmwares advertise a very low value of 2us for > > CEDE1 exit latency on a Dedicated LPAR. The latency advertized by the > > PHYP hypervisor corresponds to the latency required to wakeup from the > > underlying hardware idle state. However the wakeup latency from the > > LPAR perspective should include > > > > 1. The time taken to transition the CPU from the Hypervisor into the > > LPAR post wakeup from platform idle state > > > > 2. Time taken to send the IPI from the source CPU (waker) to the idle > > target CPU (wakee). > > > > 1. can be measured via timer idle test, where we queue a timer, say > > for 1ms, and enter the CEDE state. When the timer fires, in the timer > > handler we compute how much extra timer over the expected 1ms have we > > consumed. On a a POWER9 LPAR the numbers are > > > > CEDE latency measured using a timer (numbers in ns) > > N Min Median Avg 90%ile 99%ile Max Stddev > > 400 2601 5677 5668.74 5917 6413 9299 455.01 > > > > 1. and 2. combined can be determined by an IPI latency test where we > > send an IPI to an idle CPU and in the handler compute the time > > difference between when the IPI was sent and when the handler ran. We > > see the following numbers on POWER9 LPAR. > > > > CEDE latency measured using an IPI (numbers in ns) > > N Min Median Avg 90%ile 99%ile Max Stddev > > 400 711 7564 7369.43 8559 9514 9698 1200.01 > > > > Suppose, we consider the 99th percentile latency value measured using > > the IPI to be the wakeup latency, the value would be 9.5us This is in > > the ballpark of the default value of 10us. > > > > Hence, use the exit latency of CEDE(0) based on the latency values > > advertized by platform only from POWER10 onwards. The values > ^^^^^^^ > > advertized on POWER10 platforms is more realistic and informed by the > > latency measurements. For earlier platforms stick to the default value > > of 10us. > > ... > > > diff --git a/drivers/cpuidle/cpuidle-pseries.c > > b/drivers/cpuidle/cpuidle-pseries.c > > index a2b5c6f..7207467 100644 > > --- a/drivers/cpuidle/cpuidle-pseries.c > > +++ b/drivers/cpuidle/cpuidle-pseries.c > > @@ -419,7 +419,8 @@ static int pseries_idle_probe(void) > > cpuidle_state_table = shared_states; > > max_idle_state = ARRAY_SIZE(shared_states); > > } else { > > - fixup_cede0_latency(); > > + if (pvr_version_is(PVR_POWER10)) > > + fixup_cede0_latency(); > > A PVR check like that tests for *only* Power10, not Power10 and onwards > as you say in the change log.
Right. The accurate thing would be to check not do the fix up for !(PVR_POWER4 || PVR_POWER4p || POWER_POWER5 || PVR_POWER5p || PVR_POWER6 || PVR_POWER7 || PVR_POWER8 || PVR_POWER9) But that was a bit mouthful. I will go with your suggestion (from private correspondence) if (cpu_has_feature(CPU_FTR_ARCH_31) || pvr_version_is(PVR_POWER10)) fixup_cede0_latency(); since it will allow the fixup for Processors suporting ISA 3.1 (POWER10 and above) and also on POWER10 CPUs running in compat mode. > > The other question is what should happen on a Power10 LPAR that's > running in Power9 compat mode. I assume in that case we *do* want to use > the firmware provided values, because they're tied to the underlying > CPU, not the compat mode? > Yes, the firmware provided values are tied to the underlying CPU. Not the compat mode. > In which case a check for !PVR_POWER9 would seem to achieve what we > want? > > cheers -- Thanks and Regards gautham.