On Thu, 23 Dec 2021 08:33:09 -0500
"Michael S. Tsirkin" <m...@redhat.com> wrote:

> On Wed, Dec 22, 2021 at 04:10:07PM -0700, Alex Williamson wrote:
> > On Wed, 22 Dec 2021 15:48:24 -0500
> > "Michael S. Tsirkin" <m...@redhat.com> wrote:
> >   
> > > On Wed, Dec 22, 2021 at 12:08:09PM -0700, Alex Williamson wrote:  
> > > > On Tue, 21 Dec 2021 18:40:09 -0500
> > > > "Michael S. Tsirkin" <m...@redhat.com> wrote:
> > > The reset is actually just an attempt to approximate power off.
> > > So I'm not sure that is right powering device off and then on
> > > is just a slow but reasonable way for guests to reset a device.  
> > 
> > Agree, I don't have a problem with resetting devices in response to the
> > slot being powered off, just that it's pointless, and in some scenarios
> > causes us additional grief when it occurs when the device is being
> > removed anyway.
> >   
> > > >  In that case we could reorganize things to let the unplug occur
> > > > before the power transition.    
> > > 
> > > Hmm you mean unplug on host immediately when it starts blinking?
> > > But drivers are not notified at this point, are they?  
> > 
> > I think this is confusing Attention Indicator and Power Indicator
> > again.  
> 
> Let's try to clear it up.
> 
> Here's text from the SHPC spec, pcie spec is less clear imho but
> same idea IIUC.
> 
> The Power Indicator provides visual feedback to the human operator (if the 
> system
> software accepts the request initiated by the Attention Button) by blinking. 
> Once the
> Power Indicator begins blinking, a 5-second abort interval exists during 
> which a second
> depression of the Attention Button cancels the operation.
> 
> Attention Indicator is confusingly unrelated to the Attention Button.
> Right?

Yeah, I think that's where I was getting confused.  So a qdev_unplug()
results in "pushing" the attention button, the power indicator starts
flashing for 5s, during which an additional attention button press
could cancel the event.  After that 5s period and with the power
indicator still flashing, the power controller is set to off, followed
by the power indicator turning off.

> >  The write sequence I noted for the slot control register was as
> > follows:
> > 
> >     01f1 - > 02f1 -> 06f1 -> 07f1
> > 
> >  01f1:
> >    Attention Indicator: OFF
> >    Power Indicator: ON
> >    Power Controller: ON
> > 
> >  02f1:
> >    Power Indicator: ON -> BLINK
> > 
> >  06f1:
> >    Power Controller: ON -> OFF
> > 
> >  07f1:
> >    Power Indicator: BLINK -> OFF
> > 
> > The device reset currently occurs at 06f1, when the Power Controller
> > removes power to the slot.  The unplug doesn't occur until 07f1 when
> > the Power Indicator turns off.  On bare metal, the device power would
> > be in some indeterminate state between those two writes as the power
> > drains.  We've chosen to reset the device at the beginning of this
> > phase, where power is first removed (ie. instantaneous power drain),
> > but on a physical device it happens somewhere in between.  
> 
> Yes, this is true I think. But I think on bare metal it's guaranteed to
> happen within 1 second after power is removed, whatever the state of the
> power indicator.
> Also, Gerd attempted to add PV code that special cases KVM and
> removes all the multi-second waiting from unplug path.
> So I am not sure we should rely on the 1 second wait, either.

Right, if we don't reset when power is removed we're in a guessing game
of whether the guest is following our assumed transitions.

> >  It therefore
> > seems valid that it could happen at the moment the Power Indicator
> > turns off such that we could precede the device reset with any
> > necessary unplug operations.  
> 
> However the power indicator is merely an indicator for the
> human operator. My understanding is that driver that does not want to permit
> removing the device can turn off power without turning off
> the power indicator.

Yes, on bare metal there's likely some small window where the device
power state is indeterminate, but to take advantage of that we'd need
to do something like setup a 2s timer to reset the device, where that
timer gets canceled if the power indicator turns off in the meantime.
It's a lot of heuristics.

> > > >  Of course the original proposal also
> > > > essentially supports this interpretation, the slot power off reset does
> > > > not occur for devices with a pending unplug and those devices are
> > > > removed after the slot transition grace period.    
> > > 
> > > Meaning the patch you posted? It relies on guest doing a specific
> > > thing though, and guest and host states are not synchronized.  
> > 
> > The proposed patch just means we won't reset the device in response to
> > slot power if an unplug is pending.  So sure, if it's true that a guest
> > playing with the Power Controller without using the Power Indicator to
> > reflect the slot state could skip a device reset, but is that valid
> > guest behavior?  
> 
> I'm not 100% sure:
> The Power Indicator in the Off state indicates that insertion or removal of 
> an the adapter is
> permitted.
> 
> but also
> 
>       System software must cause a slot’s Power Indicator to be turned off 
> when the slot
>       is not powered and/or it is permissible to insert or remove an adapter.
> 
> this and/or confuses me, but I think the "or" here is simply misguided.
> The SHPC spec from which the interface in pcie was inherited just says:
> 
>       When the Power Indicator is off, it means that main power to the slot 
> is off and that
>       insertion or removal of an add-in card is permitted.

I think the power indicator is clearly our guideline for when we're
allowed to insert or remove a device from the slot.  Other than
resetting when slot power is removed as we do now, the above timer hack
is the only solution I can think of that guarantees we eventually reset
the device after power is removed without relying on the power
indicator transition.  But I'm not sure it's worthwhile.

> > > I think it might work to defer reset if it's blinking until it actually
> > > stops blinking. To me it seems a bit less risky but but again, in theory
> > > some guest driver could use the power cycle reset while hotplug plays
> > > with PIC waiting for the cancel button press.  
> > 
> > That's essentially my previous suggestion above.  The Power Indicator
> > blinking tells us the slot power is in transition, the option to press
> > the attention button to abort has passed.  I understand the abort
> > window to be governed by the Attention Indicator blinking.  
> 
> what text in the spec gives you that impression?

My misunderstanding of the attention vs power indicators.

> > > E.g. I suspect your patch can be broken just by guest loading/unloading
> > > driver in a loop while host also triggers plug/unplug.  
> > 
> > It's not clear to me why that might fail.  Can you elaborate?  All it
> > does is skip the reset when an unplug is pending, but the actual unplug
> > finalizes the device and any subsequent plug necessarily uses a new
> > device, so I don't see what goes wrong.  
> 
> host wants to start unplug
> meanwhile guest wants to attempt a reset (for its own reasons)
> we skip the reset to device retains a bunch of state in
> its registers, the guest attempts to drive it assuming
> a reset device.

Under the condition of the kernel device lock being held, we can't
reset the device anyway.  But yes, we don't need to extend that
vfio-pci limitation to other devices.  Unless you're interested in
pursuing a timer based solution, which I guess would model a physical
system with super capacitors on the power rail that will drain
eventually, or instantly with the power indicator turning off, I think
the best we can do at the moment is to clean up the error reporting in
vfio-pci, report that the reset failed, but not some obscure error
about un-owned groups because we've fallen through to unexpected reset
methods.  A reasonable error message for this condition can be
considered a feature rather than a regression.  It's arguably correct
to try to reset the device here.  I'll post a different patch where we
clean up the vfio-pci error reporting for this case.  Thanks,

Alex


Reply via email to