On 10/09/19 12:23, Igor Mammedov wrote:
> On Tue, 8 Oct 2019 23:12:10 +0200
> Laszlo Ersek <ler...@redhat.com> wrote:
> 
>> On 10/08/19 17:06, Igor Mammedov wrote:
>>> On Tue,  8 Oct 2019 13:27:14 +0200
>>> Laszlo Ersek <ler...@redhat.com> wrote:
>>>  
>>>> MaxCpuCountInitialization() currently handles the following options:
>>>>
>>>> (1) QEMU does not report the boot CPU count.
>>>>
>>>>     In this case, PlatformPei makes MpInitLib enumerate APs up to the
>>>>     default PcdCpuMaxLogicalProcessorNumber value (64) minus 1, or until
>>>>     the default PcdCpuApInitTimeOutInMicroSeconds (50,000) elapses.
>>>>     (Whichever is reached first.)
>>>>
>>>>     Time-limited AP enumeration had never been reliable on QEMU/KVM, which
>>>>     is why commit 45a70db3c3a5 strated handling case (2) below, in OVMF.
>>>>
>>>> (2) QEMU reports the boot CPU count.
>>>>
>>>>     In this case, PlatformPei sets PcdCpuMaxLogicalProcessorNumber to the
>>>>     reported boot CPU count, and PcdCpuApInitTimeOutInMicroSeconds to
>>>>     practically "infinity" (MAX_UINT32, ~71 minutes). That causes
>>>>     MpInitLib to enumerate exactly the available (boot) APs.
>>>>
>>>>     With CPU hotplug in mind, this method is not good enough. While
>>>>     UefiCpuPkg expects PcdCpuMaxLogicalProcessorNumber to include both
>>>>     boot (i.e., cold-plugged), and all *potentially* hot-plugged, logical
>>>>     processors, the boot CPU count reported by QEMU does not cover the
>>>>     second category.  
>>>
>>> Can you elaborate why it doesn't cover the second category?  
>>
>> (Elaborate just in this thread, or elaborate in the commit message?)
>>
>> The boot CPU count reported by QEMU does not cover all potentially
>> hot-plugged logical processors ... because it is not supposed to.
>>
>> Namely, FW_CFG_NB_CPUS exposes "pcms->boot_cpus", and that is not
>> supposed to cover all potentially hot-plugged CPUs, only the boot-time
>> CPUs.
> 'potentially hotplugged' confuses me greatly in this context
> (I'm not sure if you talk about hotplugged or not existing yet CPUs).
> 
> If it's about possible CPUs limit, I'd use "possible CPUs"
> instead (wich includes present and not present CPUs).

Good idea. I was struggling to find the correct term. I realize the
"possible CPUs" term has been used in QEMU forever. I'll update the
commit message.

> 
> 
>> Whereas PcdCpuMaxLogicalProcessorNumber needs to include all potentially
>> hot-plugged CPUs too. For that, we need to expose "ms->smp.max_cpus"
>> from QEMU.
>>
>> Does this answer your question?
> You answered question better below
> 
> [...]
> 
>> When a CPU is hotplugged at OS runtime, "pcms->boot_cpus" increases in
>> QEMU. 
>> Before the patch, the "pcms->boot_cpus" increase causes
>> PcdCpuMaxLogicalProcessorNumber to increase as well. That breaks S3
>> resume. PcdCpuMaxLogicalProcessorNumber must not change from initial
>> boot to S3 resume.
> Probably something along this lines should be put in commit message,
> which clearly states a requirement for max_cpus

OK, I'll work that into the commit message too.

Thanks!
Laszlo

> 
>>
>> After the patch, the "pcms->boot_cpus" increase does not increase
>> PcdCpuMaxLogicalProcessorNumber. PcdCpuMaxLogicalProcessorNumber remains
>> the same (namely, "ms->smp.max_cpus"). Therefore, the CPU structures
>> allocated during normal boot, for "ms->smp.max_cpus", will accommodate
>> the CPUs that have been hotplugged prior to S3 resume.
>>
>> I tested the S3 behavior, plus the values mentioned above are also
>> logged to the OVMF debug log, by this patch:
>>
>>> +  DEBUG ((DEBUG_INFO, "%a: BootCpuCount=%d mMaxCpuCount=%u\n", 
>>> __FUNCTION__,
>>> +    BootCpuCount, mMaxCpuCount));  
>>
>> This is one libvirt domain XML snippet I've used for testing:
>>
>>   <vcpu placement='static' current='3'>4</vcpu>
>>   <vcpus>
>>     <vcpu id='0' enabled='yes' hotpluggable='no'/>
>>     <vcpu id='1' enabled='yes' hotpluggable='no'/>
>>     <vcpu id='2' enabled='yes' hotpluggable='no'/>
>>     <vcpu id='3' enabled='no' hotpluggable='yes'/>
>>   </vcpus>
>>   <os>
>>     <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
>> ...
>>   <cpu mode='custom' match='exact' check='partial'>
>>     <model fallback='allow'>Haswell-noTSX</model>
>>     <topology sockets='1' cores='2' threads='2'/>
>>     <feature policy='require' name='vmx'/>
>>     <feature policy='require' name='pcid'/>
>>     <feature policy='require' name='spec-ctrl'/>
>>     <feature policy='require' name='pdpe1gb'/>
>>   </cpu>
>>
>> And the command used for testing was:
>>
>> $ virsh setvcpu GUEST 3 --enable --live
>>
>> When this is done, the guest kernel dmesg confirms the CPU hotplug, but
>> it is not onlined automatically. I onlined it via /sys manually in the
>> guest.
>>
>> After that point, I exercised S3 suspend / resume. During S3 resume, the
>> "BootCpuCount" log (from above) increases from 3 to 4 (and the related
>> edk2 infrastructure messages are in sync). All the while "mMaxCpuCount"
>> preserves its initial value, 4.
>>
>> Finally, after S3 resume, I ran "rdmsr -a 0x3a" in the guest, which
>> printed "5" for all *four* CPUs. This confirms that OVMF has set the
>> Feature Control MSR on the new VCPU too, during S3 resume.
>>
>> (The MSR is zeroed on platform reset, and has to be re-configured by
>> firmware, on every CPU, during S3. It is set to 5 due to the "vmx" CPU
>> feature in the above domain XML snippet. References:
>>
>> - https://bugzilla.tianocore.org/show_bug.cgi?id=86
>> - 
>> https://github.com/tianocore/tianocore.github.io/wiki/Testing-SMM-with-QEMU,-KVM-and-libvirt#confirm-simple-multiprocessing-during-boot
>>
>> The specific meaning of the MSR is totally irrelevant now, it's just a
>> good use case for testing multi-processing during S3 resume. For MSR
>> 0x3A (Feature Control), the firmware must execute the wrmsr
>> instructionon on each CPU, and the "rdmsr -a" Linux userspace command
>> reads back the MSR on each CPU separately.)
>>
>> Thanks!
>> Laszlo
> 


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#48661): https://edk2.groups.io/g/devel/message/48661
Mute This Topic: https://groups.io/mt/34441232/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to