Markus Armbruster <arm...@redhat.com> writes: > Peter Xu <pet...@redhat.com> writes: > >> On Mon, Aug 23, 2021 at 05:56:23PM -0400, Eduardo Habkost wrote: >>> I don't have any other example, but I assume address assignment >>> based on ordering is a common pattern in device code. >>> >>> I would take a very close and careful look at the devices with >>> non-default vmsd priority. If you can prove that the 13 device >>> types with non-default priority are all order-insensitive, a >>> custom sort function as you describe might be safe. >> >> Besides virtio-mem-pci, there'll also similar devfn issue with all >> MIG_PRI_PCI_BUS, as they'll be allocated just like other pci devices. Say, >> below two cmdlines will generate different pci topology too: >> >> $ qemu-system-x86_64 -device pcie-root-port,chassis=0 \ >> -device pcie-root-port,chassis=1 \ >> -device virtio-net-pci >> >> And: >> >> $ qemu-system-x86_64 -device pcie-root-port,chassis=0 \ >> -device virtio-net-pci >> -device pcie-root-port,chassis=1 \ >> >> This cannot be solved by keeping priority==0 ordering. >> >> After a second thought, I think I was initially wrong on seeing migration >> priority and device realization the same problem. >> >> For example, for live migration we have a requirement on PCI_BUS being >> migrated >> earlier than MIG_PRI_IOMMU because there's bus number information required >> because IOMMU relies on the bus number to find address spaces. However >> that's >> definitely not a requirement for device realizations, say, realizing vIOMMU >> after pci buses are fine (bus assigned during bios). >> >> I've probably messed up with the ideas (though they really look alike!). >> Sorry >> about that. >> >> Since the only ordering constraint so far is IOMMU vs all the rest of >> devices, >> I'll introduce a new priority mechanism and only make sure vIOMMUs are >> realized >> earlier. That'll also avoid other implications on pci devfn allocations. >> >> Will rework a new version tomorrow. Thanks a lot for all the comments, > > Is it really a good idea to magically reorder device realization just to > make a non-working command line work? Why can't we just fail the > non-working command line in a way that tells users how to get a working > one? We have way too much ordering magic already... > > If we decide we want more magic, then I'd argue for *dependencies* > instead of priorities. Dependencies are specific and local: $this needs > to go after $that because $reasons. Priorities are unspecific and > global.
Having thought about this a bit more... Constraints on realize order are nothing new. For instance, when a device plugs into a bus, it needs to be realized after the device providing the bus. We ensure this by having the device refer to the bus, e.g. bus=pci.0. The reference may be implicit, but it's there. It must resolve for device creation to succeed, and if it resolves, the device providing the bus will be realized in time. I believe what's getting us into trouble with IOMMU is not having such a reference. Or in other words, keeping the dependence between the IOMMU and the devices relying on it *implicit*, and thus hidden from the existing realize-ordering machinery. Instead of inventing another such machinery, let's try to use the one we already have.