On 24.02.2022 15:53, Roger Pau Monné wrote: > On Thu, Feb 24, 2022 at 01:58:31PM +0100, Jan Beulich wrote: >> On 24.02.2022 13:43, Roger Pau Monne wrote: >>> TBD: it's unclear whether we still need the pcidevs_lock in >>> iommu_dev_iotlb_flush_timeout. The caller of >>> iommu_dev_iotlb_flush_timeout is already bogus as it iterates over a >>> list of pdevs without holding the pcidevs_lock. >> >> Analysis of whether / where recursive uses are needed should imo >> include cases where the lock ought to be held, but currently isn't >> (like apparently this case). > > Well, I'm not opposed to that. I think just aiming to get the current > usages analyzed will already be hard.
While I fully agree, the decision to drop recursive locking would better not put roadblocks in the way of adding locking where it is currently missing. >>> @@ -1487,6 +1487,11 @@ static int assign_device(struct domain *d, u16 seg, >>> u8 bus, u8 devfn, u32 flag) >>> ASSERT(pdev && (pdev->domain == hardware_domain || >>> pdev->domain == dom_io)); >>> >>> + /* Do not allow broken devices to be assigned. */ >>> + rc = -EBADF; >>> + if ( pdev->broken ) >>> + goto done; >> >> I think this wants exceptions for Dom0 and DomIO. An admin may be >> able to fix things in Dom0, e.g. by updating device firmware. > > Doesn't a device get assigned to DomIO (or Dom0 if not using quarantine > mode), and then when deassigned from DomIO gets assigned to Dom0? > > So there's no usage of assign_device in the path that (re)assigns a > device used by a guest into Dom0? Well, this assumes all tool stacks behave like the xl one presently does. Which I think is the wrong way round: Making a device assignable should be "deassign from Dom0" (implicitly making it land in DomIO), while removing a device from the set of assignable ones should be "assign to Dom0" (implicitly deassigning from DomIO). That way (I think I said so elsewhere earlier on) libxl would also avoid the need to actually use DOMID_IO explicitly. It using DOMID_SELF like done in various other places would seem more clen to me. Paul - I think it was you who decided to make it work the way it currently does. Do you have any insights into your thought process back then which you could share? > Or would you rather imply that pdev->broken should get cleared at some > point (ie: when the device is assigned back to Dom0)? I did ponder this for a while when writing the earlier reply. But I decided against, for it being a functional change: _pci_hide_device() currently is also sticky. But yes, in principle if a misbehaving device _can_ be fixed (e.g. by updating its firmware), then I think there needs to be a way to clear the "broken" flag. Jan