Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data

Alexey Kardashevskiy Thu, 19 Jul 2012 07:51:00 -0700

On 20/07/12 00:43, Michael S. Tsirkin wrote:
> On Fri, Jul 20, 2012 at 12:24:05AM +1000, Alexey Kardashevskiy wrote:
>> One comment below.
>> 
>> 
>> On 19/07/12 19:27, Michael S. Tsirkin wrote:
>>> On Thu, Jul 19, 2012 at 10:32:40AM +1000, Alexey Kardashevskiy
>>> wrote:
>>>> On 19/07/12 01:23, Michael S. Tsirkin wrote:
>>>>> On Wed, Jul 18, 2012 at 11:17:12PM +1000, Alexey Kardashevskiy
>>>>> wrote:
>>>>>> On 18/07/12 22:43, Michael S. Tsirkin wrote:
>>>>>>> On Thu, Jun 21, 2012 at 09:39:10PM +1000, Alexey
>>>>>>> Kardashevskiy wrote:
>>>>>>>> Added (msi|msix)_set_message() functions.
>>>>>>>> 
>>>>>>>> Currently msi_notify()/msix_notify() write to these
>>>>>>>> vectors to signal the guest about an interrupt so the
>>>>>>>> correct values have to written there by the guest or
>>>>>>>> QEMU.
>>>>>>>> 
>>>>>>>> For example, POWER guest never initializes MSI/MSIX
>>>>>>>> vectors, instead it uses RTAS hypercalls. So in order to
>>>>>>>> support MSIX for virtio-pci on POWER we have to initialize
>>>>>>>> MSI/MSIX message from QEMU.
>>>>>>>> 
>>>>>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
>>>>>>> 
>>>>>>> So guests do enable MSI through config space, but do not
>>>>>>> fill in vectors?
>>>>>> 
>>>>>> Yes. msix_capability_init() calls arch_setup_msi_irqs() which
>>>>>> does everything it needs to do (i.e. calls hypervisor) before
>>>>>> msix_capability_init() writes PCI_MSIX_FLAGS_ENABLE to the
>>>>>> PCI_MSIX_FLAGS register.
>>>>>> 
>>>>>> These vectors are the PCI bus addresses, the way they are set
>>>>>> is specific for a PCI host controller, I do not see why the
>>>>>> current scheme is a bug.
>>>>> 
>>>>> I won't work with any real PCI device, will it? Real pci devices
>>>>> expect vectors to be written into their memory.
>>>> 
>>>> 
>>>> Yes. And the hypervisor does this. On POWER (at least book3s -
>>>> server powerpc, the whole config space kitchen is hidden behind
>>>> RTAS (kind of bios). For the guest, this RTAS is implemented in
>>>> hypervisor, for the host - in the system firmware. So powerpc
>>>> linux does not have to have PHB drivers. Kinda cool.
>>>> 
>>>> Usual powerpc server is running without the host linux at all, it
>>>> is running a hypervisor called pHyp. And every guest knows that it
>>>> is a guest, there is no full machine emulation, it is
>>>> para-virtualization. In power-kvm, we replace that pHyp with the
>>>> host linux and now QEMU plays a hypervisor role. Some day We will
>>>> move the hypervisor to the host kernel completely (?) but now it
>>>> is in QEMU.
>>> 
>>> OKay. So it is a POWER-specific weirdness as I suspected. Sure, if
>>> this is what real hardware does we pretty much have to emulate
>>> this.
>>> 
>>>>>>> Very strange. Are you sure it's not just a guest bug? How
>>>>>>> does it work for other PCI devices?
>>>>>> 
>>>>>> Did not get the question. It works the same for every PCI
>>>>>> device under POWER guest.
>>>>> 
>>>>> I mean for real PCI devices.
>>>>> 
>>>>>>> Can't we just fix guest drivers to program the vectors
>>>>>>> properly?
>>>>>>> 
>>>>>>> Also pls address the comment below.
>>>>>> 
>>>>>> Comment below.
>>>>>> 
>>>>>>> Thanks!
>>>>>>> 
>>>>>>>> --- hw/msi.c  |   13 +++++++++++++ hw/msi.h  |    1 + 
>>>>>>>> hw/msix.c |    9 +++++++++ hw/msix.h |    2 ++ 4 files
>>>>>>>> changed, 25 insertions(+)
>>>>>>>> 
>>>>>>>> diff --git a/hw/msi.c b/hw/msi.c index 5233204..cc6102f
>>>>>>>> 100644 --- a/hw/msi.c +++ b/hw/msi.c @@ -105,6 +105,19 @@
>>>>>>>> static inline uint8_t msi_pending_off(const PCIDevice*
>>>>>>>> dev, bool msi64bit) return dev->msi_cap + (msi64bit ?
>>>>>>>> PCI_MSI_PENDING_64 : PCI_MSI_PENDING_32); }
>>>>>>>> 
>>>>>>>> +void msi_set_message(PCIDevice *dev, MSIMessage msg) +{ +
>>>>>>>> uint16_t flags = pci_get_word(dev->config +
>>>>>>>> msi_flags_off(dev)); +    bool msi64bit = flags &
>>>>>>>> PCI_MSI_FLAGS_64BIT; + +    if (msi64bit) { +
>>>>>>>> pci_set_quad(dev->config + msi_address_lo_off(dev),
>>>>>>>> msg.address); +    } else { +
>>>>>>>> pci_set_long(dev->config + msi_address_lo_off(dev),
>>>>>>>> msg.address); +    } +    pci_set_word(dev->config +
>>>>>>>> msi_data_off(dev, msi64bit), msg.data); +} +
>>>>>>> 
>>>>>>> Please add documentation. Something like
>>>>>>> 
>>>>>>> /* * Special API for POWER to configure the vectors through 
>>>>>>> * a side channel. Should never be used by devices. */
>>>>>> 
>>>>>> 
>>>>>> It is useful for any para-virtualized environment I believe,
>>>>>> is not it? For s390 as well. Of course, if it supports PCI,
>>>>>> for example, what I am not sure it does though :)
>>>>> 
>>>>> I expect the normal guest to program the address into MSI
>>>>> register using config accesses, same way that it enables
>>>>> MSI/MSIX. Why POWER does it differently I did not yet figure out
>>>>> but I hope this weirdness is not so widespread.
>>>> 
>>>> 
>>>> In para-virt I would expect the guest not to touch config space at
>>>> all. At least it should use one interface rather than two but this
>>>> is how it is.
>>> 
>>> It's not new that firmware developers consistently make
>>> inconsistent design decisions :)
>> 
>> 
>> It depends on how to look at it. Enabling MSI via the config space is
>> also done via a special set of hypervisor calls (common and
>> IBM-specific) so it is all hidden in one place - the system firmware,
>> what is cool - no PHB drivers in the guest. Although MSI would not
>> need any additional hypercall to init vectors (everything can be done
>> via config space), there is MSI-X which stores vectors in BAR and
>> there is no hypercall for BARs as they are simply memory mapped. This
>> is I think why the firmware people (or phyp but it is probably the
>> same) added IBM-specific MSI/MSIX config hypercalls.
> 
> Well what's wrong with guest doing this through a memory mapped
> interface?



Should not guest allocate addresses and program PHB with them?
The idea was to hide PHB details in the system firmware, this is the point.


>> And I do not quite understand why MSIX people could not use extended
>> PCI config space which is 4096 bytes, quite a lot, enough to fit 256
>> vectors (have not seen a card which asked for more than 9 _per
>> function_). If somebody really needs 2048, he may want 16384 as well
>> (or any other crazy number), etc, so why did they put such a limit, it
>> is a BAR, it is huge? :) A, offtopic anyway.


> Well you have just described MSI, just don't use MSIX.
> 
> The motivation for MSIX was as follows: PCI/PCI-X config space is not
> 4096 bytes, it is 256 bytes, and is very crowded. You are thinking of
> PCI express. 

MSIX is PCIe feature, no?

> Config accesses are also nonposted which means at most one
> must be in flight. This is not appropriate for vector programming which
> needs to be done from multiple CPUs in parallel.

> Also offtopic, please try to avoid these super long lines in mail :).

Ah. This is from the time when I posted patches via thunderbird and
disabled wrapping :) Is wrapping at 75 chars ok?


>> 
>> 
>>>>>>>> bool msi_enabled(const PCIDevice *dev) { return
>>>>>>>> msi_present(dev) && diff --git a/hw/msi.h b/hw/msi.h index
>>>>>>>> 75747ab..6ec1f99 100644 --- a/hw/msi.h +++ b/hw/msi.h @@
>>>>>>>> -31,6 +31,7 @@ struct MSIMessage {
>>>>>>>> 
>>>>>>>> extern bool msi_supported;
>>>>>>>> 
>>>>>>>> +void msi_set_message(PCIDevice *dev, MSIMessage msg); 
>>>>>>>> bool msi_enabled(const PCIDevice *dev); int
>>>>>>>> msi_init(struct PCIDevice *dev, uint8_t offset, unsigned
>>>>>>>> int nr_vectors, bool msi64bit, bool msi_per_vector_mask); 
>>>>>>>> diff --git a/hw/msix.c b/hw/msix.c index ded3c55..5f7d6d3
>>>>>>>> 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -45,6 +45,15 @@
>>>>>>>> static MSIMessage msix_get_message(PCIDevice *dev,
>>>>>>>> unsigned vector) return msg; }
>>>>>>>> 
>>>>>>>> +void msix_set_message(PCIDevice *dev, int vector, struct
>>>>>>>> MSIMessage msg) +{ +    uint8_t *table_entry =
>>>>>>>> dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE; + +
>>>>>>>> pci_set_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR,
>>>>>>>> msg.address); +    pci_set_long(table_entry +
>>>>>>>> PCI_MSIX_ENTRY_DATA, msg.data); +
>>>>>>>> table_entry[PCI_MSIX_ENTRY_VECTOR_CTRL] &=
>>>>>>>> ~PCI_MSIX_ENTRY_CTRL_MASKBIT; +} + /* Add MSI-X capability
>>>>>>>> to the config space for the device. */ /* Given a bar and
>>>>>>>> its size, add MSI-X table on top of it * and fill MSI-X
>>>>>>>> capability in the config space. diff --git a/hw/msix.h
>>>>>>>> b/hw/msix.h index 50aee82..26a437e 100644 --- a/hw/msix.h 
>>>>>>>> +++ b/hw/msix.h @@ -4,6 +4,8 @@ #include "qemu-common.h" 
>>>>>>>> #include "pci.h"
>>>>>>>> 
>>>>>>>> +void msix_set_message(PCIDevice *dev, int vector,
>>>>>>>> MSIMessage msg); + int msix_init(PCIDevice *pdev, unsigned
>>>>>>>> short nentries, MemoryRegion *bar, unsigned bar_nr,
>>>>>>>> unsigned bar_size); -- 1.7.10
>>>>>>>> 
>>>>>>>> ps. double '-' and git version is an end-of-patch scissor
>>>>>>>> as I read somewhere, cannot recall where exactly :)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 21/06/12 20:56, Jan Kiszka wrote:
>>>>>>>>> On 2012-06-21 12:50, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 21/06/12 20:38, Jan Kiszka wrote:
>>>>>>>>>>> On 2012-06-21 12:28, Alexey Kardashevskiy wrote:
>>>>>>>>>>>> On 21/06/12 17:39, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2012-06-21 09:18, Alexey Kardashevskiy
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> agrhhh. sha1 of the patch changed after
>>>>>>>>>>>>>> rebasing :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Added (msi|msix)_(set|get)_message() function
>>>>>>>>>>>>>> for whoever might want to use them.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Currently msi_notify()/msix_notify() write to
>>>>>>>>>>>>>> these vectors to signal the guest about an
>>>>>>>>>>>>>> interrupt so the correct values have to 
>>>>>>>>>>>>>> written there by the guest or QEMU.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> For example, POWER guest never initializes
>>>>>>>>>>>>>> MSI/MSIX vectors, instead it uses RTAS
>>>>>>>>>>>>>> hypercalls. So in order to support MSIX for
>>>>>>>>>>>>>> virtio-pci on POWER we have to initialize
>>>>>>>>>>>>>> MSI/MSIX message from QEMU.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> As only set* function are required by now, the
>>>>>>>>>>>>>> "get" functions were added or made public for
>>>>>>>>>>>>>> a symmetry.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy
>>>>>>>>>>>>>> <a...@ozlabs.ru> --- hw/msi.c  |   29
>>>>>>>>>>>>>> +++++++++++++++++++++++++++++ hw/msi.h  |    2
>>>>>>>>>>>>>> ++ hw/msix.c |   11 ++++++++++- hw/msix.h |
>>>>>>>>>>>>>> 3 +++ 4 files changed, 44 insertions(+), 1
>>>>>>>>>>>>>> deletion(-)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> diff --git a/hw/msi.c b/hw/msi.c index
>>>>>>>>>>>>>> 5233204..9ad84a4 100644 --- a/hw/msi.c +++
>>>>>>>>>>>>>> b/hw/msi.c @@ -105,6 +105,35 @@ static inline
>>>>>>>>>>>>>> uint8_t msi_pending_off(const PCIDevice* dev,
>>>>>>>>>>>>>> bool msi64bit) return dev->msi_cap + (msi64bit
>>>>>>>>>>>>>> ? PCI_MSI_PENDING_64 : PCI_MSI_PENDING_32); }
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +MSIMessage msi_get_message(PCIDevice *dev)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> MSIMessage msi_get_message(PCIDevice *dev,
>>>>>>>>>>>>> unsigned vector)
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Who/how/why is going to calculate the vector
>>>>>>>>>>>> here?
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +{ +    uint16_t flags =
>>>>>>>>>>>>>> pci_get_word(dev->config +
>>>>>>>>>>>>>> msi_flags_off(dev)); +    bool msi64bit =
>>>>>>>>>>>>>> flags & PCI_MSI_FLAGS_64BIT; +    MSIMessage
>>>>>>>>>>>>>> msg; + +    if (msi64bit) { +
>>>>>>>>>>>>>> msg.address = pci_get_quad(dev->config +
>>>>>>>>>>>>>> msi_address_lo_off(dev)); +    } else { +
>>>>>>>>>>>>>> msg.address = pci_get_long(dev->config +
>>>>>>>>>>>>>> msi_address_lo_off(dev)); +    } +    msg.data
>>>>>>>>>>>>>> = pci_get_word(dev->config + msi_data_off(dev,
>>>>>>>>>>>>>> msi64bit));
>>>>>>>>>>>>> 
>>>>>>>>>>>>> And I have this here in addition:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> unsigned int nr_vectors =
>>>>>>>>>>>>> msi_nr_vectors(flags); ...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> if (nr_vectors > 1) { msg.data &= ~(nr_vectors -
>>>>>>>>>>>>> 1); msg.data |= vector; }
>>>>>>>>>>>>> 
>>>>>>>>>>>>> See PCI spec and existing code.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> What for? I really do not get it why someone might
>>>>>>>>>>>> want to read something but not real value. What
>>>>>>>>>>>> PCI code should I look?
>>>>>>>>>>> 
>>>>>>>>>>> I'm not sure what your use case for reading the
>>>>>>>>>>> message is. For KVM device assignment it is
>>>>>>>>>>> preparing an alternative message delivery path for
>>>>>>>>>>> MSI vectors. And for this we will need vector
>>>>>>>>>>> notifier support for MSI as well. You can check the
>>>>>>>>>>> MSI-X code for corresponding use cases of 
>>>>>>>>>>> msix_get_message.
>>>>>>>>>> 
>>>>>>>>>>> And when we already have msi_get_message, another
>>>>>>>>>>> logical use case is msi_notify. See msix.c again.
>>>>>>>>>> 
>>>>>>>>>> Aaaa.
>>>>>>>>>> 
>>>>>>>>>> I have no case for reading the message. All I need is
>>>>>>>>>> writing. And I want it public as I want to use it from
>>>>>>>>>> hw/spapr_pci.c. You suggested to add reading, I added
>>>>>>>>>> "get" to be _symmetric_ to "set" ("get" returns what
>>>>>>>>>> "set" wrote). You want a different thing which I can
>>>>>>>>>> do but it is not msi_get_message(), it is something
>>>>>>>>>> like msi_prepare_message(MSImessage msg) or 
>>>>>>>>>> msi_set_vector(uint16_t data) or simply internal
>>>>>>>>>> kitchen of msi_notify().
>>>>>>>>>> 
>>>>>>>>>> Still can do what you suggested, it just does not seem
>>>>>>>>>> right.
>>>>>>>>> 
>>>>>>>>> It is right - when looking at it from a different angle.
>>>>>>>>> ;)
>>>>>>>>> 
>>>>>>>>> I don't mind if you add msi_get_message now or leave
>>>>>>>>> this to me. Likely the latter is better as you have no
>>>>>>>>> use case for msi_get_message (and also
>>>>>>>>> msix_get_message!) outside of their modules, thus we
>>>>>>>>> should not export those functions anyway.
>> 
>> 
>> -- Alexey
>> 


-- 
Alexey

Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data

Reply via email to