On 18.01.2022 11:48, Roger Pau Monné wrote:
> On Mon, Sep 06, 2021 at 03:02:12PM +0200, Jan Beulich wrote:
>> From: Chen Yu <[email protected]>
>>
>> Because cpuidle assumes worst-case C-state parameters, PC6 parameters
>> are used for describing C6, which is worst-case for requesting CC6.
>> When PC6 is enabled, this is appropriate. But if PC6 is disabled
>> in the BIOS, the exit latency and target residency should be adjusted
>> accordingly.
>>
>> Exit latency:
>> Previously the C6 exit latency was measured as the PC6 exit latency.
>> With PC6 disabled, the C6 exit latency should be the one of CC6.
>>
>> Target residency:
>> With PC6 disabled, the idle duration within [CC6, PC6) would make the
>> idle governor choose C1E over C6. This would cause low energy-efficiency.
>> We should lower the bar to request C6 when PC6 is disabled.
>>
>> To fill this gap, check if PC6 is disabled in the BIOS in the
>> MSR_PKG_CST_CONFIG_CONTROL(0xe2) register. If so, use the CC6 exit latency
>> for C6 and set target_residency to 3 times of the new exit latency. [This
>> is consistent with how intel_idle driver uses _CST to calculate the
>> target_residency.] As a result, the OS would be more likely to choose C6
>> over C1E when PC6 is disabled, which is reasonable, because if C6 is
>> enabled, it implies that the user cares about energy, so choosing C6 more
>> frequently makes sense.
>>
>> The new CC6 exit latency of 92us was measured with wult[1] on SKX via NIC
>> wakeup as the 99.99th percentile. Also CLX and CPX both have the same CPU
>> model number as SkX, but their CC6 exit latencies are similar to the SKX
>> one, 96us and 89us respectively, so reuse the SKX value for them.
>>
>> There is a concern that it might be better to use a more generic approach
>> instead of optimizing every platform. However, if the required code
>> complexity and different PC6 bit interpretation on different platforms
>> are taken into account, tuning the code per platform seems to be an
>> acceptable tradeoff.
>>
>> Link: 
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fintel.github.io%2Fwult%2F&amp;data=04%7C01%7Croger.pau%40citrix.com%7Ccdf115e71eb14429868408d97136a902%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637665301851513469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=D9sIoe%2BFAvia3OuRsT19VAxkTqrlnPHnKqTSKiW6pRM%3D&amp;reserved=0
>>  # [1]
>> Suggested-by: Len Brown <[email protected]>
>> Signed-off-by: Chen Yu <[email protected]>
>> Reviewed-by: Artem Bityutskiy <[email protected]>
>> [ rjw: Subject and changelog edits ]
>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>> [Linux commit: 64233338499126c5c31e07165735ab5441c7e45a]
>>
>> Pull in Linux'es MSR_PKG_CST_CONFIG_CONTROL. Alongside the dropping of
>> "const" from skx_cstates[] add __read_mostly, and extend that to other
>> similar non-const tables.
>>
>> Signed-off-by: Jan Beulich <[email protected]>
> 
> Acked-by: Roger Pau Monné <[email protected]>

Thanks.

>> @@ -1161,6 +1161,36 @@ static void __init sklh_idle_state_table
>>  }
>>  
>>  /*
>> + * skx_idle_state_table_update - Adjust the Sky Lake/Cascade Lake
>> + * idle states table.
>> + */
>> +static void __init skx_idle_state_table_update(void)
>> +{
>> +    unsigned long long msr;
>> +
>> +    rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr);
>> +
>> +    /*
>> +     * 000b: C0/C1 (no package C-state support)
>> +     * 001b: C2
>> +     * 010b: C6 (non-retention)
>> +     * 011b: C6 (retention)
>> +     * 111b: No Package C state limits.
>> +     */
>> +    if ((msr & 0x7) < 2) {
> 
> I wouldn't mind adding this mask field to msr-index.h.

This wouldn't buy us much since - as per the original Linux change - the
manifest constant then still wouldn't be used here.

Jan


Reply via email to