On 24.06.2013, at 14:32, Anthony Liguori wrote: > Gleb Natapov <g...@redhat.com> writes: > >> On Sun, Jun 23, 2013 at 10:06:05AM -0500, Anthony Liguori wrote: >>> On Thu, Jun 20, 2013 at 11:46 PM, Alex Williamson >>> <alex.william...@redhat.com> wrote: >>>> On Fri, 2013-06-21 at 12:49 +1000, Alexey Kardashevskiy wrote: >>>>> On 06/21/2013 12:34 PM, Alex Williamson wrote: >>>>> >>>>> >>>>> Do not follow you, sorry. For x86, is it that MSI routing table which is >>>>> updated via KVM_SET_GSI_ROUTING in KVM? When there is no KVM, what piece >>>>> of >>>>> code responds on msi_notify() in qemu-x86 and does qemu_irq_pulse()? >>>> >>>> vfio_msi_interrupt->msi[x]_notify->stl_le_phys(msg.address, msg.data) >>>> >>>> This writes directly to the interrupt block on the vCPU. With KVM, the >>>> in-kernel APIC does the same write, where the pin to MSIMessage is setup >>>> by kvm_irqchip_add_msi_route and the pin is pulled by an irqfd. >>> >>> What is this "interrupt block on the vCPU" you speak of? I reviewed >> FEE00000H address as seen from PCI bus is a special address range (see >> 10.11.1 in SDM). > > Ack. > >> Any write by a PCI device to that address range is >> interpreted as MSI. We do not model this correctly in QEMU yet since >> all devices, including vcpus, see exactly same memory map. > > This should be a per-device mapping, yes. But I'm not sure that VCPUs > should even see anything. I don't think a VCPU can generate an MSI > interrupt by writing to this location. > >>> the SDM and see nothing in the APIC protocol or the brief description >>> of MSI as a PCI concept that would indicate anything except that the >>> PHB handles MSI writes and feeds them to the I/O APIC. >>> >> I/O APIC? Did you mean APIC, but even that will probably be incorrect. >> I'd say it translates the data to APIC bus message. And with interrupt >> remapping there is more magic happens between MSI and APIC bus. > > I think the wording in the SDM allows either. > >>> In fact, the wikipedia article on MSI has: >>> >>> "A common misconception with Message Signaled Interrupts is that they >>> allow the device to send data to a processor as part of the interrupt. >>> The data that is sent as part of the write is used by the chipset to >>> determine which interrupt to trigger on which processor; it is not >>> available for the device to communicate additional information to the >>> interrupt handler." >>> >> Not sure who claimed otherwise. > > So to summarize: > > 1) MSI writes are intercepted by the PHB and generates an appropriate > IRQ. > > 2) The PHB has a tuple of (src device, address, data) plus whatever > information it maintains to do the translation. > > 3) On Power, we can have multiple PHBs. > > 4) The kernel interface assumes a single flat table mapping (address, > data) to interrupts. We try to keep that table up-to-date in QEMU. > > 5) The reason the kernel has MSI info at all is to allow for IRQFDs to > generate MSI interrupts. > > Is there anything that prevents us from using IRQFDs corresponding to > the target of an MSI mapping and get rid of the MSI info in the kernel?
What would that interface look like? An MSI does not arrive at an I/O APIC pin, so we can't use the existing "give me an irqfd for this pin" command. Alex