On 28/01/16 05:12, Feng Wu wrote: > This is the core logic handling for VT-d posted-interrupts. Basically it > deals with how and when to update posted-interrupts during the following > scenarios: > - vCPU is preempted > - vCPU is slept > - vCPU is blocked > > When vCPU is preempted/slept, we update the posted-interrupts during > scheduling by introducing two new architecutral scheduler hooks: > vmx_pi_switch_from() and vmx_pi_switch_to(). When vCPU is blocked, we > introduce a new architectural hook: arch_vcpu_block() to update > posted-interrupts descriptor. > > Besides that, before VM-entry, we will make sure the 'NV' filed is set > to 'posted_intr_vector' and the vCPU is not in any blocking lists, which > is needed when vCPU is running in non-root mode. The reason we do this check > is because we change the posted-interrupts descriptor in vcpu_block(), > however, we don't change it back in vcpu_unblock() or when vcpu_block() > directly returns due to event delivery (in fact, we don't need to do it > in the two places, that is why we do it before VM-Entry). > > When we handle the lazy context switch for the following two scenarios: > - Preempted by a tasklet, which uses in an idle context. > - the prev vcpu is in offline and no new available vcpus in run queue. > We don't change the 'SN' bit in posted-interrupt descriptor, this > may incur spurious PI notification events, but since PI notification > event is only sent when 'ON' is clear, and once the PI notificatoin > is sent, ON is set by hardware, hence no more notification events > before 'ON' is clear. Besides that, spurious PI notification events are > going to happen from time to time in Xen hypervisor, such as, when > guests trap to Xen and PI notification event happens, there is > nothing Xen actually needs to do about it, the interrupts will be > delivered to guest atht the next time we do a VMENTRY. > > CC: Keir Fraser <k...@xen.org> > CC: Jan Beulich <jbeul...@suse.com> > CC: Andrew Cooper <andrew.coop...@citrix.com> > CC: Kevin Tian <kevin.t...@intel.com> > CC: George Dunlap <george.dun...@eu.citrix.com> > CC: Dario Faggioli <dario.faggi...@citrix.com> > Suggested-by: Yang Zhang <yang.z.zh...@intel.com> > Suggested-by: Dario Faggioli <dario.faggi...@citrix.com> > Suggested-by: George Dunlap <george.dun...@eu.citrix.com> > Suggested-by: Jan Beulich <jbeul...@suse.com> > Signed-off-by: Feng Wu <feng...@intel.com>
Feng, Thanks for your work on this. Coming back to this thread after 5 months, what strikes me first of all is that it would be good to have a comment somewhere laying out exactly all the things that need to change for the different runstates with posted interrupts, so that someone later trying to change things has an overview of what invariants need to be kept. What do you think about adding the following comment somewhere near the PI callbacks? (Corrected for accuracy of course.) --- To handle posted interrupts correctly, we need to set the following state: * The PI notification vector (NV) * The PI notification destination processor (NDST) * The PI "suppress notification" bit (SN) * The vcpu pi "blocked" list If a VM is currently running, we want the PI delivered to the guest vcpu on the proper pcpu (NDST = v->processor, SN clear). If the vm is blocked, we want the PI delivered to Xen so that it can wake it up (SN clear, NV = pi_wakeup_vector, vcpu on block list). If the VM is currently either preempted or offline (i.e., not running because of some reason other than blocking waiting for an interrupt), there's nothing Xen can do -- we want the interrupt pending bit set in the guest, but we don't want to bother Xen with an interrupt (SN clear). There's a brief window of time between vmx_intr_assist() and checking softirqs where if an interrupt comes in it may be lost; so we need Xen to get an interrupt and raise a softirq so that it will go through the vmx_intr_assist() path again (SN clear, NV = posted_interrupt). The way we implement this now is by looking at what needs to happen on the following runstate transitions: A: runnable -> running - SN = 0 - NDST = v->processor B: running -> runnable - SN = 1 C: running -> blocked - NV = pi_wakeup_vector - Add vcpu to blocked list D: blocked -> runnable - NV = posted_intr_vector - Take vcpu off blocked list For transitions A and B, we add hooks into vmx_ctxt_switch_{from,to} paths. For transition C, we add a new arch hook, arch_vcpu_block(), which is called from vcpu_block() and vcpu_do_poll(). For transition D, rather than add an extra arch hook on vcpu_wake, we add a hook on the vmentry path which checks to see if either of the two actions need to be taken. These hooks only need to be called when the domain in question actually has a physical device assigned to it, so we set and clear the callbacks as appropriate when device assignment changes. --- Is that about right? If we had this, I don't think we'd need the comments in vmx_pi_switch_{from,to}. Laying things out this way, it also makes me wonder whether it might not be more sensible / robust to set NDST on the vmentry path in the same way we set NV. But at this point it's just bikeshedding, so feel free to leave it where it is. Other than that -- and the details about placement of the ASSERT and the hook reassignment -- it all looks good to me. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel