On Fri, Feb 07, 2025 at 06:18:50PM +0000, Peter Maydell wrote:
> On Fri, 7 Feb 2025 at 17:48, Peter Xu <pet...@redhat.com> wrote:
> >
> > On Fri, Feb 07, 2025 at 04:58:39PM +0000, Peter Maydell wrote:
> > > (I wonder if we ought to suggest quiescing outstanding
> > > DMA in the enter phase? But it's probably easier to fix
> > > the iommus like this series does than try to get every
> > > dma-capable pci device to do something different.)
> >
> > I wonder if we should provide some generic helper to register vIOMMU reset
> > callbacks, so that we'll be sure any vIOMMU model impl that will register
> > at exit() phase only, and do nothing during the initial two phases.  Then
> > we can put some rich comment on that helper on why.
> >
> > Looks like it means the qemu reset model in the future can be a combination
> > of device tree (which resets depth-first) and the three phases model.  We
> > will start to use different approach to solve different problems.
> 
> The tree of QOM devices (i.e. the one based on the qbus buses
> and rooted at the sysbus) resets depth-first, but it does so in
> three phases: first we traverse everything doing 'enter'; then
> we traverse everything doing 'hold'; then we traverse everything
> doing 'exit'. There *used* to be an awkward mix of some things
> being three-phase and some not, but we have now got rid of all
> of those so a system reset does a single three-phase reset run
> which resets everything.

Right.  Sorry I wasn't very clear before indeed on what I wanted to
express.

My understanding is the 3 phases reset, even if existed, was not designed
to order things like vIOMMU and devices that is already described by system
topology.  That's, IMHO, exactly what QOM topology wanted to achieve right
now on ordering device resets and the whole depth-first reset method would
make sense with it.

So from that specific POV, it's a mixture use of both methods on ordering
of devices to reset now (rather than the order of reset process within a
same device, provided into 3 phases).  It may not be very intuitive when
someone reads about the two reset mechanisms, as one would naturally take
vIOMMU as a root object of any other PCIe devices under root complex, and
thinking the order should be guaranteed by QOM on reset already.  In
reality it's not.  So that's the part I wonder if we want to document.

So we must make sure both:

  - All vIOMMUs across all archs must only tear down its mapping at its
    exit() phase, providing the mapping available for all devices during
    the initial 2 phases (probably we could even assert the initial 2 phase
    functions to be NULL when there's a base class).  Meanwhile,

  - All PCIe devices must quiesce their DMA in the initial 2 phases,
    guaranteeing that there's no on-the-fly DMAs possible in the complete
    3rd exit() phase, because any vIOMMU implementation can start to tear
    down its device mappings even as the first entry in 3rd phase (IOW,
    there's also no order constraint for 3rd phase that vIOMMU exit() will
    be invoked before devices' exit()).

I'm not sure if it would be important to document this, but only thought
about it if we want crystal clearance on the choice of this design.

Thanks,

-- 
Peter Xu


Reply via email to