On 22/05/12 16:31, Alexander Graf wrote: > > > On 22.05.2012, at 08:11, Alexey Kardashevskiy <a...@ozlabs.ru> wrote: > >> On 22/05/12 15:52, Alexander Graf wrote: >>> >>> >>> On 22.05.2012, at 05:44, Alexey Kardashevskiy <a...@ozlabs.ru> wrote: >>> >>>> On 22/05/12 13:21, Alexander Graf wrote: >>>>> >>>>> >>>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt >>>>> <b...@kernel.crashing.org> wrote: >>>>> >>>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>>>>>> Alexander, >>>>>>> >>>>>>> Is that any better? :) >>>>>> >>>>>> Alex (Graf that is), ping ? >>>>>> >>>>>> The original patch from Alexey was fine btw. >>>>>> >>>>>> VFIO will always call things with the existing capability offset so >>>>>> there's no real risk of doing the wrong thing or break the list or >>>>>> anything. >>>>>> >>>>>> IE. A small simple patch that addresses the problem :-) >>>>>> >>>>>> The new patch is a bit more "robust" I believe, I don't think we need to >>>>>> go too far to fix a problem we don't have. But we need a fix for the >>>>>> real issue and the simple patch does it neatly from what I can >>>>>> understand. >>>>>> >>>>>> Cheers, >>>>>> Ben. >>>>>> >>>>>>> >>>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >>>>>>> * in pci config space */ >>>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>> uint8_t offset, uint8_t size) >>>>>>> { >>>>>>> - uint8_t *config; >>>>>>> + uint8_t *config, existing; >>>>> >>>>> Existing is a pointer to the target dev's config space, right? >>>> >>>> Yes. >>>> >>>>>>> int i, overlapping_cap; >>>>>>> >>>>>>> + existing = pci_find_capability(pdev, cap_id); >>>>>>> + if (existing) { >>>>>>> + if (offset && (existing != offset)) { >>>>>>> + return -EEXIST; >>>>>>> + } >>>>>>> + for (i = existing; i < size; ++i) { >>>>> >>>>> So how does this possibly make sense? >>>> >>>> Although I do not expect VFIO to add capabilities (does not make sense), I >>>> still want to double >>>> check that this space has not been tried to use by someone else. >>> >>> i is an int. existing is a uint8_t*. >> >> >> It was there before me. This function already does a loop and this is how it >> was coded at the first place. > > Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax...
Well it is still does not make much sense to have "int i" rather than "uint8_t i" :) >>>>>>> + if (pdev->used[i]) { >>>>>>> + return -EFAULT; >>>>>>> + } >>>>>>> + } >>>>>>> + memset(pdev->used + offset, 0xFF, size); >>>>> Why? >>>> >>>> Because I am marking the space this capability takes as used. >>> >>> But if it already existed (at the same offset), it should be set used >>> already, no? Unless size > existing size, in which case you might overwrite >>> data in the next chunk, no? >> >> >> No, it does not exist for VFIO - VFIO read the config space from the host >> kernel first and then calls msi_init or msix_init or whatever_else_init >> depending on what it got from the host kernel. And these xxx_init() >> functions eventually call pci_add_capability(). > So why would the function that populates the config space initially not set > the used flag correctly? This is internal kitchen of PCIDevice which I do not want to touch from anywhere but pci.c. And there is no "fixup_capability" or something. >> Sure we can either implement own msi_init/msix_init (and may be others in >> the future) for VFIO (which would do all the same as other QEMU devices >> except touching the capabilities) OR hack msi_init/msix_init not to touch >> capabilities if they exist. > No, calling the internal functions sounds fine to me. It's the step before > that irritates me. VFIO shouldn't differ too much from an emulated device wrt > its config space population really. The last thing we want for a VFIO device is changing its capabilities list. >>>>>>> + /* Make capability read-only by default */ >>>>>>> + memset(pdev->wmask + offset, 0, size); >>>>> Why? >>>> >>>> Because the pci_add_capability() does it for a new capability by default. >>> >>> Hrm. So you're copying code? Can't you merge the overwrite and write cases? >> >> I am trying to make it as a single chunk which is as small as possible. > > No, you're needlessly duplicating code which is a bad idea :). Please reuse > as much of the existing function as possible, unless it really doesn't make > sense. I actually duplicated 4 (four) lines and did it just once. This is too little to be called "duplicating" :) And I get very special case visually separated and easy to remove if we find a better solution later. But - no problemo, I'll rework it. [no further comments] >> If it helps, below is the same patch with extended context to see what is >> going on in that function. >> >> >> >> >> >> >> hw/pci.c | 20 +++++++++++++++++++- >> 1 files changed, 19 insertions(+), 1 deletions(-) >> >> diff --git a/hw/pci.c b/hw/pci.c >> index 63a8219..7008a42 100644 >> --- a/hw/pci.c >> +++ b/hw/pci.c >> @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool >> is_default_rom) >> ptr = memory_region_get_ram_ptr(&pdev->rom); >> load_image(path, ptr); >> g_free(path); >> >> if (is_default_rom) { >> /* Only the default rom images will be patched (if needed). */ >> pci_patch_ids(pdev, ptr, size); >> } >> >> qemu_put_ram_ptr(ptr); >> >> pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); >> >> return 0; >> } >> >> static void pci_del_option_rom(PCIDevice *pdev) >> { >> if (!pdev->has_rom) >> return; >> >> vmstate_unregister_ram(&pdev->rom, &pdev->qdev); >> memory_region_destroy(&pdev->rom); >> pdev->has_rom = false; >> } >> >> /* >> * if !offset >> * Reserve space and add capability to the linked list in pci config space >> * >> * if offset = 0, >> * Find and reserve space and add capability to the linked list >> * in pci config space */ >> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >> uint8_t offset, uint8_t size) >> { >> - uint8_t *config; >> + uint8_t *config, existing; >> int i, overlapping_cap; >> >> + existing = pci_find_capability(pdev, cap_id); >> + if (existing) { >> + if (offset && (existing != offset)) { >> + return -EEXIST; >> + } >> + for (i = existing; i < size; ++i) { >> + if (pdev->used[i]) { >> + return -EFAULT; >> + } >> + } > > } > >> + memset(pdev->used + offset, 0xFF, size); >> + /* Make capability read-only by default */ >> + memset(pdev->wmask + offset, 0, size); >> + /* Check capability by default */ >> + memset(pdev->cmask + offset, 0xFF, size); >> + return existing; >> + } >> + >> if (!offset) { > > && !existing maybe? > >> offset = pci_find_space(pdev, size); >> if (!offset) { >> return -ENOSPC; >> } >> } else { >> /* Verify that capabilities don't overlap. Note: device assignment >> * depends on this check to verify that the device is not broken. >> * Should never trigger for emulated devices, but it's helpful >> * for debugging these. */ >> for (i = offset; i < offset + size; i++) { >> overlapping_cap = pci_find_capability_at_offset(pdev, i); >> if (overlapping_cap) { >> fprintf(stderr, "ERROR: %04x:%02x:%02x.%x " >> "Attempt to add PCI capability %x at offset " >> "%x overlaps existing capability %x at offset %x\n", >> pci_find_domain(pdev->bus), pci_bus_num(pdev->bus), >> PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), >> cap_id, offset, overlapping_cap, i); >> return -EINVAL; >> } >> } >> } >> > > If (!existing) { > >> config = pdev->config + offset; >> config[PCI_CAP_LIST_ID] = cap_id; >> config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >> pdev->config[PCI_CAPABILITY_LIST] = offset; >> pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > > } > > which poses the question why the above wouldn't apply for the existing case. > It would work just as well to leave that in, no? > > Alex > >> memset(pdev->used + offset, 0xFF, size); >> /* Make capability read-only by default */ >> memset(pdev->wmask + offset, 0, size); >> /* Check capability by default */ >> memset(pdev->cmask + offset, 0xFF, size); >> return offset; >> } >> >> >> >> >>>>>>> + /* Check capability by default */ >>>>>>> + memset(pdev->cmask + offset, 0xFF, size); >>>>> >>>>> I don't understand this part either. >>>> >>>> The pci_add_capability() does it for a new capability by default. >>>> >>>> >>>> >>>>> >>>>> Alex >>>>> >>>>>>> + return existing; >>>>>>> + } >>>>>>> + >>>>>>> if (!offset) { >>>>>>> offset = pci_find_space(pdev, size); >>>>>>> if (!offset) { >>>>>>> return -ENOSPC; >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>>>>>>> On 12/05/12 00:13, Alexander Graf wrote: >>>>>>>>> >>>>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>>>>>>> >>>>>>>>>> 11.05.2012 20:52, Alexander Graf =0?8A0;: >>>>>>>>>>> >>>>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>>>>>>> >>>>>>>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>>>>>>> is being built by QEMU. >>>>>>>>>>>> >>>>>>>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>>>>>>> >>>>>>>>>>>> For example, the old code destroys the following config >>>>>>>>>>>> of PCIe Intel E1000E: >>>>>>>>>>>> >>>>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>>>>>>> 0x34: 0xC8 >>>>>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>>>>> 0xD0: 0x05 0xE0 >>>>>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>>>>> >>>>>>>>>>>> after: >>>>>>>>>>>> 0x34: 0xD0 >>>>>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>>>>> 0xD0: 0x05 0xC8 >>>>>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>>>>> >>>>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>>>>>>> >>>>>>>>>>>> The proposed patch does not change capability pointers when >>>>>>>>>>>> the same type capability is about to add. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>>>>>>>>>> --- >>>>>>>>>>>> hw/pci.c | 10 ++++++---- >>>>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>>>>>>> --- a/hw/pci.c >>>>>>>>>>>> +++ b/hw/pci.c >>>>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, >>>>>>>>>>>> uint8_t cap_id, >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> config = pdev->config + offset; >>>>>>>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>>>>>>> >>>>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to >>>>>>>>>>> do a loop through all capabilities, check if the one you want to >>>>>>>>>>> add is there already and if so either >>>>>>>>>>> * replace the existing one or >>>>>>>>>>> * drop out and not write the new one in. >>>>>>>>> >>>>>>>>> * hw_error :) >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'm not sure which way would be more natural. >>>>>>>>>> >>>>>>>>>> There is a third option - add another function, lets call it >>>>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() >>>>>>>>>> does >>>>>>>>>> but won't touch list pointers. >>>>>>>>> >>>>>>>>> What good is a function that breaks internal consistency? >>>>>>>> >>>>>>>> >>>>>>>> It is broken already by having PCIDevice.used field. Normally >>>>>>>> pci_add_capability() would go through >>>>>>>> the whole list and add a capability if it does not exist. Emulated >>>>>>>> devices which care about having a >>>>>>>> capability at some fixed offset would have initialized their config >>>>>>>> space before calling this >>>>>>>> capabilities API (as VFIO does). >>>>>>>> >>>>>>>> If we really want to support emulated devices which want some >>>>>>>> capabilities be at fixed offset and >>>>>>>> others at random offsets (strange, but ok), I do not see how it is bad >>>>>>>> to restore this consistency >>>>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at >>>>>>>> different location as a guest >>>>>>>> driver may care about its offset. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>> When vfio, pci_add_capability() is called from the code which knows >>>>>>>>>> exactly that the capability exists and where it is and it calls >>>>>>>>>> pci_add_capability() based on this knowledge so doing additional >>>>>>>>>> loops >>>>>>>>>> just for imaginery scalability is a bit weird, no? >>>>>>>>> >>>>>>>>> Not sure I understand your proposal. The more generic a framework is, >>>>>>>>> the better, no? In this code path we don't care about speed. We only >>>>>>>>> care about consistency and reliability. -- Alexey