Hi,
On 2/7/25 6:31 PM, Peter Xu wrote: > On Fri, Feb 07, 2025 at 05:06:20PM +0000, Peter Maydell wrote: >> On Fri, 7 Feb 2025 at 16:54, Peter Xu <pet...@redhat.com> wrote: >>> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote: >>>> This is a follow-up of Peter's attempt to fix the fact that >>>> vIOMMUs are likely to be reset before the device they protect: >>>> >>>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices >>>> https://lore.kernel.org/all/20240117091559.144730-1-pet...@redhat.com/ >>>> >>>> This is especially observed with virtio devices when a qmp system_reset >>>> command is sent but also with VFIO devices. >>>> >>>> This series puts the vIOMMU reset in the 3-phase exit callback. >>>> >>>> This scheme was tested successful with virtio-devices and some >>>> VFIO devices. Nevertheless not all the topologies have been >>>> tested yet. >>> Eric, >>> >>> It's great to know that we seem to be able to fix everything in such small >>> changeset! >>> >>> I would like to double check two things with you here: >>> >>> - For VFIO's reset hook, looks like we have landed more changes so that >>> vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the >>> reset during "hold" phase only (via legacy_reset_hold()). That part >>> will make sure vIOMMU (if switching to exit()-only reset) will order >>> properly with VFIO. Is my understanding correct here? >> Yes, we now do a reset of the whole system as a three-phase setup, >> and the old pre-three-phase reset APIs like qemu_register_reset() and >> device_class_set_legacy_reset() all happen during the "hold" phase. >> >>> - Is it possible if some PCIe devices that will provide its own >>> phase.exit(), would it matter on the order of PCIe device's >>> phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use >>> exit()-only approach like this one)? >> It's certainly possible for a PCIe device to implement >> a three-phase reset which does things in the exit phase. However >> I think I would say that such a device which didn't cancel all >> outstanding DMA operations during either 'enter' or 'hold' >> phases would be broken. If it did some other things during >> the 'exit' phase I don't think the ordering of those vs the >> iommu 'exit' handling should matter. > Yes, this sounds fair. > >> (To some extent the splitting into three phases is trying >> to set up a consistent model as outlined in docs/devel/reset.rst >> and to some extent it's just a convenient way to get a basic >> "this reset thing I need to do must happen after some other >> device has done its reset things" which you can achieve >> by ad-hoc putting them in different phases. Ideally we get >> mostly the former and a little pragmatic dose of the latter, >> but the consistent model is not very solidly nailed down >> so I have a feeling the proportions may not be quite as >> lopsided as we'd like :-) ) > Yes, it's a good move that we can have other ways to fix all the problems > without major surgery, and it also looks solid and clean if we have plan to > fix any outlier PCIe devices. > > If there will be a repost after all, not sure if Eric would like to add > some of above discussions into either some commit messages or cover letter. > Or some comment in the code might be even better. Yes I will definitively augment commit msgs/cover letter with all those considerations. Thank you very much for this rich discussion! Eric > > Thanks! >