On 01/11/16 19:01, Marcel Apfelbaum wrote: > On 01/11/2016 07:15 PM, Laszlo Ersek wrote: >> On 01/11/16 17:34, Marcel Apfelbaum wrote: >>> On 01/11/2016 06:11 PM, Laszlo Ersek wrote: >>>> On 01/11/16 13:24, Marcel Apfelbaum wrote: >>>>> Two reasons: >>>>> - PCI Spec indicates that while the bit is not set >>>>> the memory sizing is not finished. >>>>> - pci_bar_address will return PCI_BAR_UNMAPPED >>>>> and a previous value can be accidentally overridden >>>>> if the command register is modified (and not the BAR). >>>>> >>>>> Signed-off-by: Marcel Apfelbaum <mar...@redhat.com> >>>>> --- >>>>> >>>>> Hi, >>>>> >>>>> I found this when trying to use multiple root complexes with OVMF. >>>>> >>>>> When trying to attach a device to the pxb-pcie device as Integrated >>>>> Device it did not receive the IO/MEM resources. >>>>> >>>>> The reason is that OVMF is working like that: >>>>> 1. It disables the Decode (I/O or memory) bit in the Command >>>>> register >>>>> 2. It configures the device BARS >>>>> 3. Makes some tests on the Command register >>>>> 4. ... >>>>> 5. Enables the Decode (I/O or memory) at some point. >>>>> >>>>> On step 3 all the BARS are overridden to 0xffffffff by QEMU. >>>>> >>>>> Since QEMU uses the device BARs to compute the new host bridge >>>>> resources >>>>> it now gets garbage. >>>>> >>>>> Laszlo, this also solves the SHPC problem for the pci-2-pci bridge >>>>> inside the pxb. >>>>> Now we can enable the SHPC for it too. >>>> >>>> I encountered the exact same problem months ago. I posted patches for >>>> it; you were CC'd. :) >>>> >>>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342209 >>>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342210 >>>> >>>> As you can see under the second link above, I made the same analysis & >>>> observations as you do now. (It took me quite long to track down the >>>> "inexplicable" behavior of edk2's generic PCI bus driver / enumerator >>>> that is built into OVMF.) >>> >>> Wow, I just re-worked this issue again from 0! I wish I have remembered >>> those threads :( >>> This was another symptom of the exact problem! And I remembered >>> something about >>> SHPC, I should have looked at those mail threads again... >>> >>>> >>>> I proposed to change pci_bar_address() so that it could return, to >>>> distinguished callers, the BAR values "under programming", even if the >>>> command bits were clear. Then the ACPI generator would utilize this >>>> special exception. >>>> >>>> Michael disagreed; in >>>> >>>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342242 >>>> >>>> he wrote "[t]his is problematic - disabled BAR values have no meaning >>>> according to the PCI spec". >>>> >>> >>> Yes... because it looked like a hook for our case only, >>> the good news is that this patch is based exactly on the fact that >>> the BARs have no meaning if the bit is not set. >>> >>>> The current solution to the problem (= we disable the SHPC) was >>>> recommended by Michael in that message: "It might be best to add a >>>> property to just disable shpc in the bridge so no devices reside >>>> directly behind the pxb?" >>>> >>> >>> I confess I don't exactly understand what the SHPC of the pci-2-pci >>> bridge >>> has to do with sibling devices on the pxb's root bus (SHPC is the >>> hot-plug controller >>> for the devices behind the pci-2-pci bridge). >>> >>> The second part I do understand, the pxb design was to not have devices >>> directly behind >>> the pxb, so maybe he meant that SHPC is the part of the pci-bridge that >>> behaves like >>> a device in the sense it requires IO/MEM resources. >>> >>> Bottom line, your solution for the PXB was just fine :) >>> >>> >>>> In comparison, your patch doesn't change pci_bar_address(). Instead, it >>>> modifies pci_update_mappings() *not to call* pci_bar_address(), if the >>>> respective command bits are clear. >>>> >>>> I guess that could have about the same effect. >>>> >>>> If, unlike my patch, yours actually improves QEMU's compliance with the >>>> PCI specs, then it's likely a good patch. (And apparently more general >>>> than the SHPC-specific solution we have now.) >>> >>> >>> Exactly! Why should a pci write to the command register *delete* >>> previously set resources? I am looking at it as a bug. >>> >>> And also updating the mappings while the Decoding bit is not enables >>> is at least not necessary. >>> >>>> >>>> I just don't know if it's a good idea to leave any old mappings active >>>> while the BARs are being reprogrammed (with the command bits clear). >>>> >>> >>> First, because the OS can't use the IO/MEM anyway, secondly the guest >>> OS/firmware >>> is the one that disabled the bit... (in order to program resources) >> >> I have something like the following in mind. Do you think it is a valid >> (although contrived) use case? >> >> - guest programs some BAR and uses it (as designed / intended) >> - guest disables command bit, modifies BAR location >> - guest accesses *old* BAR location >> >> What should a guest *expect* in such a case? Is this invalid guest >> behavior? > > Yes, this is indeed invalid behavior, from the device point of view > it is disabled. Best case scenario - the guest will see 0xffffffff, > worst case - garbage. > >> >> If it is not invalid, then will QEMU comply with the guest's >> expectations if your patch is applied? Pre-patch, the guest would likely >> access a "hole" in the host bridge MMIO aperture, whereas with your >> patch (I guess?) it still might access the device through the old (still >> active) BAR? >> > > Since the IO is disabled, pci_bar_address will return PCI_BAR_UNMAPPED > and *no updates will be made* pre or post this patch. It will behave the > same > from the guest point of view. It will still access the memory region > of the device. > > >> Or would QEMU prevent that just by virtue of the command bit being clear? > > Again, this patch only changes the behavior in a specific case: > when the device is disabled and the guest writes to the command register > without > enabling IO/MEM.
Ah, right! That's exactly what the edk2 PCI bus driver / enumerator does. Massages the command register without touching those two bits, and... > > Pre-patch -> the old BAR addresses are overridden with 0xffffffff > (I really think this is a bug, nobody asked for this!!) the previously programmed BAR address gets lost. It's great that you have the PCI knowledge to state that this is actually a bug! It had looked fishy to me as well, but I couldn't make the same argument. > Post-Patch -> the old BAR values are returned to the prev state -> > correct behavior IMHO. I agree. > Please continue to ask questions until we get to the bottom of it. :) Okay, I think I can now try to review this patch. See below. > > Thanks, > Marcel > >> >> Thanks >> Laszlo >> >>>> In other words, what guarantees that this change will not regress >>>> anything? (I'm not doubting -- I'm asking; I honestly don't know.) >>>> >>>> So I guess I'll defer to Michael on this one. >>> >>> Michael, do you agree with the above? >>> >>>> >>>> In any case, I fully agree with your analysis of OVMF's behavior. >>> >>> Thanks! I looked for this bug in OVMF for some time now :) >>> Marcel >>> >>>> >>>> Thanks! >>>> Laszlo >>>> >>>>> Thanks, >>>>> Marcel >>>>> >>>>> hw/pci/pci.c | 17 +++++++++++++++++ >>>>> 1 file changed, 17 insertions(+) >>>>> >>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c >>>>> index 168b9cc..f9127dc 100644 >>>>> --- a/hw/pci/pci.c >>>>> +++ b/hw/pci/pci.c >>>>> @@ -1148,6 +1148,7 @@ static void pci_update_mappings(PCIDevice *d) >>>>> PCIIORegion *r; >>>>> int i; >>>>> pcibus_t new_addr; >>>>> + uint16_t cmd = pci_get_word(d->config + PCI_COMMAND); >>>>> >>>>> for(i = 0; i < PCI_NUM_REGIONS; i++) { >>>>> r = &d->io_regions[i]; >>>>> @@ -1156,6 +1157,22 @@ static void pci_update_mappings(PCIDevice *d) >>>>> if (!r->size) >>>>> continue; >>>>> >>>>> + /* >>>>> + * Do not update the mappings until the command register's >>>>> + * Decode (I/O or memory) bit is not set. Two reasons: I propose the following wording change (for noob's like myself :)) Do not update this BAR's mapping if the command register's decode bit (I/O or memory, matching the BAR's type) is clear. Two reasons: ... Spelling out "this BAR's mapping" is clearer to me -- we're looping over the BAR's. (The end result is the same of course.) >>>>> + * - PCI Spec indicates that while the bit is not set >>>>> + * the memory sizing is not finished. I propose "BAR sizing". >>>>> + * - pci_bar_address will return PCI_BAR_UNMAPPED I propose pci_bar_address() -- i.e., parens. >>>>> + * and a previous value can be accidentally overridden I recommend "may be unintentionally" over "can be accidentally". >>>>> + * if the command register is modified (and not the BAR). >>>>> + * */ The last line should simply terminate the comment block -- runaway asterisk I think. >>>>> + if (((r->type & PCI_BASE_ADDRESS_SPACE_IO) && >>>>> + !(cmd & PCI_COMMAND_IO)) || >>>>> + ((r->type != PCI_BASE_ADDRESS_SPACE_IO) && >>>>> + !(cmd & PCI_COMMAND_MEMORY))) { >>>>> + continue; >>>>> + } >>>>> + It might be equivalent, but in the second part, I'd feel better about !(r->type & PCI_BASE_ADDRESS_SPACE_IO) than (r->type != PCI_BASE_ADDRESS_SPACE_IO) Or even better: uint16_t decode_bit = (r->type & PCI_BASE_ADDRESS_SPACE_IO) ? PCI_COMMAND_IO : PCI_COMMAND_MEMORY; if (!(cmd & decode_bit)) { continue; } ... The placement of the "continue" statement looks good. Just some thoughts; I won't complain if the patch is committed as-is. :) Thanks Laszlo >>>>> new_addr = pci_bar_address(d, i, r->type, r->size); >>>>> >>>>> /* This bar isn't changed */ >>>>> >>>> >>>> >>> >> >