below > -----Original Message----- > From: Paolo Bonzini [mailto:pbonz...@redhat.com] > Sent: Friday, August 16, 2019 3:20 PM > To: Yao, Jiewen <jiewen....@intel.com>; Laszlo Ersek > <ler...@redhat.com>; de...@edk2.groups.io > Cc: edk2-rfc-groups-io <r...@edk2.groups.io>; qemu devel list > <qemu-devel@nongnu.org>; Igor Mammedov <imamm...@redhat.com>; > Chen, Yingwen <yingwen.c...@intel.com>; Nakajima, Jun > <jun.nakaj...@intel.com>; Boris Ostrovsky <boris.ostrov...@oracle.com>; > Joao Marcal Lemos Martins <joao.m.mart...@oracle.com>; Phillip Goerl > <phillip.go...@oracle.com> > Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF > > On 16/08/19 04:46, Yao, Jiewen wrote: > > Comment below: > > > > > >> -----Original Message----- > >> From: Paolo Bonzini [mailto:pbonz...@redhat.com] > >> Sent: Friday, August 16, 2019 12:21 AM > >> To: Laszlo Ersek <ler...@redhat.com>; de...@edk2.groups.io; Yao, > Jiewen > >> <jiewen....@intel.com> > >> Cc: edk2-rfc-groups-io <r...@edk2.groups.io>; qemu devel list > >> <qemu-devel@nongnu.org>; Igor Mammedov > <imamm...@redhat.com>; > >> Chen, Yingwen <yingwen.c...@intel.com>; Nakajima, Jun > >> <jun.nakaj...@intel.com>; Boris Ostrovsky > <boris.ostrov...@oracle.com>; > >> Joao Marcal Lemos Martins <joao.m.mart...@oracle.com>; Phillip Goerl > >> <phillip.go...@oracle.com> > >> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF > >> > >> On 15/08/19 17:00, Laszlo Ersek wrote: > >>> On 08/14/19 16:04, Paolo Bonzini wrote: > >>>> On 14/08/19 15:20, Yao, Jiewen wrote: > >>>>>> - Does this part require a new branch somewhere in the OVMF SEC > >> code? > >>>>>> How do we determine whether the CPU executing SEC is BSP or > >>>>>> hot-plugged AP? > >>>>> [Jiewen] I think this is blocked from hardware perspective, since the > first > >> instruction. > >>>>> There are some hardware specific registers can be used to determine > if > >> the CPU is new added. > >>>>> I don’t think this must be same as the real hardware. > >>>>> You are free to invent some registers in device model to be used in > >> OVMF hot plug driver. > >>>> > >>>> Yes, this would be a new operation mode for QEMU, that only applies > to > >>>> hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, > in > >>>> fact it doesn't reply to anything at all. > >>>> > >>>>>> - How do we tell the hot-plugged AP where to start execution? (I.e. > >> that > >>>>>> it should execute code at a particular pflash location.) > >>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0. > >>>> > >>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in > >>>> QEMU. The AP does not start execution at all when it is unplugged, > so > >>>> no cache-as-RAM etc. > >>>> > >>>> We only need to modify QEMU so that hot-plugged APIs do not reply > to > >>>> INIT/SIPI/SMI. > >>>> > >>>>> I don’t think there is problem for real hardware, who always has CAR. > >>>>> Can QEMU provide some CPU specific space, such as MMIO region? > >>>> > >>>> Why is a CPU-specific region needed if every other processor is in SMM > >>>> and thus trusted. > >>> > >>> I was going through the steps Jiewen and Yingwen recommended. > >>> > >>> In step (02), the new CPU is expected to set up RAM access. In step > >>> (03), the new CPU, executing code from flash, is expected to "send > board > >>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add > >>> message." For that action, the new CPU may need a stack (minimally if > we > >>> want to use C function calls). > >>> > >>> Until step (03), there had been no word about any other (= pre-plugged) > >>> CPUs (more precisely, Jiewen even confirmed "No impact to other > >>> processors"), so I didn't assume that other CPUs had entered SMM. > >>> > >>> Paolo, I've attempted to read Jiewen's response, and yours, as carefully > >>> as I can. I'm still very confused. If you have a better understanding, > >>> could you please write up the 15-step process from the thread starter > >>> again, with all QEMU customizations applied? Such as, unnecessary > steps > >>> removed, and platform specifics filled in. > >> > >> Sure. > >> > >> (01a) QEMU: create new CPU. The CPU already exists, but it does not > >> start running code until unparked by the CPU hotplug controller. > >> > >> (01b) QEMU: trigger SCI > >> > >> (02-03) no equivalent > >> > >> (04) Host CPU: (OS) execute GPE handler from DSDT > >> > >> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU > >> will not enter CPU because SMI is disabled) > >> > >> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM > >> rebase code. > >> > >> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable > >> new CPU > >> > >> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU. > > [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no > > restriction that INIT/SIPI/SIPI can only be sent in SMM. > > All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded > before 07a, so this is okay. [Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is delivered at 07a? I don’t see any extra step between 06 and 07a. What is the magic here?
> However I do see a problem, because a PCI device's DMA could overwrite > 0x38000 between (06) and (10) and hijack the code that is executed in > SMM. How is this avoided on real hardware? By the time the new CPU > enters SMM, it doesn't run off cache-as-RAM anymore. [Jiewen] Interesting question. I don’t think the DMA attack is considered in threat model for the virtual environment. We only list adversary below: -- Adversary: System Software Attacker, who can control any OS memory or silicon register from OS level, or read write BIOS data. -- Adversary: Simple hardware attacker, who can hot add or hot remove a CPU. I agree it is a threat from real hardware perspective. SMM may check VTd to make sure the 38000 is blocked. I doubt if it is a threat in virtual environment. Do we have a way to block DMA in virtual environment? > Paolo > > >> (08a) New CPU: (Low RAM) Enter protected mode. > > > > [Jiewen] NOTE: The new CPU still cannot use any physical memory, > because > > the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment. > > > >> (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop. > >> > >> (09) Host CPU: (SMM) Send SMI to the new CPU only. > >> > >> (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to > >> TSEG. > >> > >> (11) Host CPU: (SMM) Restore 38000. > >> > >> (12) Host CPU: (SMM) Update located data structure to add the new CPU > >> information. (This step will involve CPU_SERVICE protocol) > >> > >> (13) New CPU: (Flash) do whatever other initialization is needed > >> > >> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI. > >> > >> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in.. > >> > >> > >> In other words, the cache-as-RAM phase of 02-03 is replaced by the > >> INIT-SIPI-SIPI sequence of 07b-08a-08b. > > [Jiewen] I am OK with this proposal. > > I think the rule is same - the new CPU CANNOT touch any system memory, > > no matter it is from reset-vector or from INIT/SIPI/SIPI. > > Or I would say: if the new CPU want to touch some memory before first > SMI, the memory should be > > CPU specific or on the flash. > > > > > > > >>>> The QEMU DSDT could be modified (when secure boot is in effect) to > OUT > >>>> to 0xB2 when hotplug happens. It could write a well-known value to > >>>> 0xB2, to be read by an SMI handler in edk2. > >>> > >>> I dislike involving QEMU's generated DSDT in anything SMM (even > >>> injecting the SMI), because the AML interpreter runs in the OS. > >>> > >>> If a malicious OS kernel is a bit too enlightened about the DSDT, it > >>> could willfully diverge from the process that we design. If QEMU > >>> broadcast the SMI internally, the guest OS could not interfere with that. > >>> > >>> If the purpose of the SMI is specifically to force all CPUs into SMM > >>> (and thereby force them into trusted state), then the OS would be > >>> explicitly counter-interested in carrying out the AML operations from > >>> QEMU's DSDT. > >> > >> But since the hotplug controller would only be accessible from SMM, > >> there would be no other way to invoke it than to follow the DSDT's > >> instruction and write to 0xB2. FWIW, real hardware also has plenty of > >> 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store > >> access). > >> > >> Paolo