On 11/24/16 01:01, Laszlo Ersek wrote: > CC Jordan & Mike > > On 11/23/16 23:35, Paolo Bonzini wrote: >> >> >> On 18/11/2016 11:36, Laszlo Ersek wrote: >>> This is v3 of the series, with updates based on the v2 discussion: >>> <http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg02687.html>. >>> >>> I've added feature negotiation via the APM_STS ("scratchpad") register. >>> A new spec file called "docs/specs/q35-apm-sts.txt" is included. >>> >>> Tested with new OVMF patches (about to send out those as well). >>> Regression tested with SeaBIOS (beyond simple functional tests with >>> maximum SeaBIOS logging enabled, I used gdb to step through the new >>> ich9_apm_status_changed() callback to see if it was behaving compatibly >>> with SeaBIOS). >>> >>> The series was developed and tested on top of v2.7.0, because v2.8.0-rc0 >>> crashes very quickly for me when running OVMF: >>> >>> kvm_io_ioeventfd_add: error adding ioeventfd: File exists >>> >>> It is my understanding that there are patches on the list for this: >>> >>> [Qemu-devel] [PATCH v2 for-2.8 0/3] virtio fixes >>> >>> Anyway, the series rebases to v2.8.0-rc0 without as much as context >>> differences. >> >> Hi Laszlo, >> >> sorry for the slightly delayed reply. >> >> First of all, I'm wondering if we would be better off adding a new port >> 0xB1 that is QEMU-specific, instead of reusing 0xB3. > > Sure, I can look into that, if we agree that's the best way to proceed, > for now. (Although I'm not really happy about the new memory region > stuff it would require. :() > > I CC'd Kevin to learn if he foresaw other uses for the APM_STS register > in SeaBIOS. > > BTW, what happens in QEMU if the guest reads an unimplemented port? > Hm... unassigned_io_write() seems to be a no-op, and > unassigned_io_read() returns all-bits-one. This means that for a new > port, the negotiation protocol / values have to be reworked. > > Port 0xB1 is occupied by ICH9 according to the spec: > > Table 9-2. Fixed I/O Ranges Decoded by Intel ® ICH9 (Sheet 2 of 2) > > I/O > Address Read Target Write Target Internal Unit > ------- -------------------- -------------------- ------------- > B0h–B1h Interrupt Controller Interrupt Controller Interrupt > > I wonder if we care -- after all, APM_STS (0xB3) is documented not to > have any hardware effects ("scratchpad register"). > >> Second, I now remembered the reason why I was against broadcast SMI. >> The reason is that it breaks hot-plug. > > How does it break hot-plug? After reading your explanation below: is it > that the broadcast SMI (possibly raised by the OS directly) would get to > the new VCPU before the firmware relocated its SMBASE? > >> >> On hot-plug, the firmware (if it wants to use SMI for anything secure) >> must relocate the SMBASE of the newly-hotplugged CPU before the OS has >> any chance to put its fangs on it. This is because the OS can send >> direct SMIs and is in control of the area at 0x30000. So OVMF is >> already broken with respect to hotplug, > > Yes. We theorized that there could be further edk2 core modules that > hadn't been open sourced yet, necessary for handling VCPU hotplug. > >> but I am not yet sure if these >> patches would break it further. > > Hard to say without seeing those modules. > > I will speculate: assuming that the non-public modules fit together with > the public modules in some way, I expect the broadcast SMI shouldn't > break them. The reason is that the broadcast SMI / traditional delivery > are the default method in UefiCpuPkg, and in practice they work better > (more reliably) with the rest of the edk2 infrastructure, in my > experience, than the relaxed sync method. > > In his review today, Jordan wrote > <https://lists.01.org/pipermail/edk2-devel/2016-November/005088.html>, > >> I'm glad we'll be using a mechanism that broadcasts to all the >> processors like the real hardware. It is a bit unfortunate that it >> doesn't go through the b2 port for it. > > If broadcast is how real hardware does it (even by default!, > apparently), I expect those as-yet unreleased, hotplug-handling modules > in edk2 should be able to cope with broadcast. > >> The full solution is to follow a protocol similar to what real hardware >> does. > > Real hardware seems to use broadcast, according to the above... > > On 11/23/16 23:35, Paolo Bonzini wrote: > >> On hot-plug, before the new CPU starts running, the DSDT should >> generate an SMI (with a well-known value written to 0xB2). > > I'm not sure I understand right. If it is the DSDT that writes to 0xB2, > that's just another way to say, "the firmware vendor asks the operating > system to write to 0xB2". If the malicious OS is smart enough, it can > realize (from the hardware signal to run the ACPI GPE handler, IIRC) > that a new VCPU is available, and simply not trigger the SMI. > >> The handler >> then: >> >> 1) waits for SMM rendezvous >> >> 2) unparks the hotplugged VCPU. Until the VCPU is unparked, it doesn't >> react to either INITs or of course SIPIs (this would need to be >> implemented separately for both TCG and KVM! but only in QEMU, not in >> the kernel). > > Okay, this does plug the hole I sketched out above. This logic (the > QEMU-specific unparking) can be done in another platform API in OVMF I > guess (like those in SmmCpuFeaturesLib), but I wonder if we have to > provide the infrastructure in platform code up to the separate SMI > command handler. I think it again depends on those unreleased modules. > >> 3) relocates SMBASE >> >> 4) records the presence of the new VCPU's APIC id for subsequent SMIs. >> >> >> The other important things are: >> >> * new CPU-hotplug controller (docs/specs/acpi_cpu_hotplug.txt) >> interfaces. I think this would only need a new bit in the write >> register at 0x4 ("CPU device control fields"). >> >> The 0x0-0x3 write register, currently reserved for reading, might >> become read/write for easier communication with the SMI handler. The >> SMI handler would write 1 to the new bit in order to unpark the VCPU. >> >> * how to enable this. I think it would need a new SMM feature, just >> like those that you are adding here, in order to make it negotiable. If >> it is not negotiated, but broadcast SMI is negotiated, you'd need to do >> something such as not allowing CPU-hotplug. (This is the only part that >> I think is required for 2.9). > > My hope is that we can work on this incrementally (or at least that we > don't forfeit our chance at SMM + VCPU hotplug) by going down the > broadcast SMI route first. Based on what Jordan wrote, this hope should > not be unfounded. > > Also, the SMM stack (across components: core edk2 code, OVMF platform > code, QEMU and KVM) is now more than a year old, and we still have > stability problems (hence this series). I'd like to stabilize the base > features before getting wild with hotplug. > > (The fact that OVMF lags behind SeaBIOS in a number of features is just > business as usual.) > > So, I'd vastly prefer if I needed to extend this series only with a way > to prevent VCPUs from being hotplugged. I think negotiating SMI > broadcast should be good enough for that, as a hook point, given that it > runs in the firmware before the OS gets control (on S3 resume too). > > I guess a global variable (or a board-level query function) that would > cause pc_query_hotpluggable_cpus() to return an empty list would be > frowned upon; what's the recommended method? > > There is DeviceClass.hotpluggable, but that's not a dynamic property. > >> >> In order to trigger the SMI, the >> >> ifctx = aml_if(aml_equal(ins_evt, one)); >> { >> aml_append(ifctx, >> aml_call2(CPU_NOTIFY_METHOD, cpu_data, dev_chk)); >> aml_append(ifctx, aml_store(one, ins_evt)); >> aml_append(ifctx, aml_store(one, has_event)); >> } >> >> would be replaced by something like this pseudo-AML: >> >> Store(And(smm_features, 2), Local1) >> ... >> Store(next_cpu_cmd, cpu_cmd) >> If (Equal(ins_evt, One)) { >> If (Greater(Local1, Zero)) { >> Store(CPU_HP_APM_CMD, apm_cmd) >> } >> CPU_NOTIFY_METHOD(cpu_data, dev_chk) >> Store(One, ins_evt) >> Store(One, has_event) >> } >> >> >> Of course all this is for OVMF only. SeaBIOS doesn't need to do >> anything of this, because it actually likes to have its SMIs only on the >> current CPU (and it had better be the BSP, since SMBASE is not relocated >> elsewhere!). >> >> Igor, any thoughts? >> >> I understand that this is quite huge in both OVMF and QEMU, but we've >> only been delaying it and we knew about it. :( > > I agree about being aware of lack of VCPU hotplug support wrt. SMM. > > I disagree about delaying it. I've been aware of problems with the base > SMM features (of differing importance), for example > > - that S3 resume with the Ia32 build of OVMF was almost hit-or-miss, > > - or that the heavy use of UEFI variable services during Windows 8 boot > caused user-visible choppiness in the rotating animation (because of how > our SMIs worked) -- I think I even mentioned this to you. > > Since the stalemate in > > http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.html > > from last November, I've never stopped disliking the lack of broadcast > SMIs, but I couldn't do anything about it, despite it leaving the base > feature set unreliable.
And let's not forget about "niceties" like - host kernel commit 879ae1880449 ("KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()"), and - kernel bugzilla <https://bugzilla.kernel.org/show_bug.cgi?id=116731>, which ate up incredible amounts of time, sending a whole lot of VFIO/OVMF/GPU-assign gamers from the vfio-users list complaining on every forum they could think of. (These issues weren't related to SMM directly, but PiSmmCpuDxeSmm has a depex on gEfiMpServiceProtocolGuid, so anything that interferes with multiprocessing in CpuDxe or MpInitLib is automatically an SMM bug too.) I can think of yet more: have you gotten around looking into <https://bugzilla.redhat.com/show_bug.cgi?id=1348092>, which is about SMM breaking on KVM hosts that lack EPT? I believe you haven't. Which is fine. I just take issue with the notion that we've been "too lazy" to enable SMM + VCPU hotplug. We got problems that come before. Thanks Laszlo > Recently, the open source edk2 SMM stack has acquired a bunch of new > code (you're aware), which exacerbate the instabilities (as I've > documented on the edk2-devel list in excruciating detail). I was > overjoyed when you finally suggested (!) to add broadcast SMI, getting > the discussion un-stuck, because (a) it miraculously fixes everything, > and (b) honestly, I see no other way from keeping the SMM stack from > falling apart otherwise, *without* VCPU hotplug in the picture. > > For the last few weeks, I've been working day and night (I'm not > exaggerating) on testing & reviewing edk2 patches related to SMM or even > just multiprocessing, catching errors, reporting them, and even > debugging and fixing them myself. (See for example commits > > 00650c531a90 UefiCpuPkg/MpInitLib/X64/MpFuncs.nasm: fix fatal typo > 4af3ae146379 UefiCpuPkg/LocalApicLib: fix feature test for Extended > Topology CPUID leaf > 1cbd83308989 UefiCpuPkg/MpInitLib: fix feature test for Extended > Topology CPUID leaf > > ). Putting out fires more or less as a norm. > > Furthermore, I haven't been pretending that VCPU hotplug "doesn't > exist", see > > https://lists.01.org/pipermail/edk2-devel/2016-November/005131.html > > for example, which I posted independently, ~8 hours ago. > > So, no. I reject the idea that we've been procrastinating SMM enabled > VCPU hotplug. We can't build the second floor while the foundation is > shaking. And then we need to ask Intel to release more code as open source. > > Thanks > Laszlo > >> >> Paolo >> >>> Cc: "Kevin O'Connor" <ke...@koconnor.net> >>> Cc: "Michael S. Tsirkin" <m...@redhat.com> >>> Cc: Gerd Hoffmann <kra...@redhat.com> >>> Cc: Paolo Bonzini <pbonz...@redhat.com> >>> >>> Thanks >>> Laszlo >>> >>> Laszlo Ersek (3): >>> hw/isa/apm: introduce callback for APM_STS_IOPORT writes >>> hw/isa/lpc_ich9: add SMI feature negotiation via APM_STS >>> hw/isa/lpc_ich9: ICH9_APM_STS_F_BROADCAST_SMI: inject SMI on all VCPUs >>> >>> docs/specs/q35-apm-sts.txt | 80 >>> ++++++++++++++++++++++++++++++++++++++++++++++ >>> include/hw/i386/ich9.h | 9 ++++++ >>> include/hw/isa/apm.h | 9 +++--- >>> hw/acpi/piix4.c | 2 +- >>> hw/isa/apm.c | 15 ++++++--- >>> hw/isa/lpc_ich9.c | 64 +++++++++++++++++++++++++++++++++++-- >>> hw/isa/vt82c686.c | 2 +- >>> 7 files changed, 168 insertions(+), 13 deletions(-) >>> create mode 100644 docs/specs/q35-apm-sts.txt >>> >