Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready

Michael S. Tsirkin Thu, 22 Oct 2020 08:03:14 -0700

On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:
> 
> 
> On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <m...@redhat.com> wrote:
> 
>     On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote:
>     >
>     >
>     > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <m...@redhat.com>
>     wrote:
>     >
>     >Â  Â  Â On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote:
>     >Â  Â  Â > Hi David, Michael,
>     >Â  Â  Â >
>     >Â  Â  Â > On Thu, Oct 22, 2020 at 3:56 PM David Gibson 
> <dgib...@redhat.com>
>     wrote:
>     >Â  Â  Â >
>     >Â  Â  Â >Â  Â  Â On Thu, 22 Oct 2020 08:06:55 -0400
>     >Â  Â  Â >Â  Â  Â "Michael S. Tsirkin" <m...@redhat.com> wrote:
>     >Â  Â  Â >
>     >Â  Â  Â >Â  Â  Â > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel 
> Apfelbaum
>     wrote:
>     >Â  Â  Â >Â  Â  Â > > From: Marcel Apfelbaum <mar...@redhat.com>
>     >Â  Â  Â >Â  Â  Â > >
>     >Â  Â  Â >Â  Â  Â > > During PCIe Root Port's transition from Power-Off to
>     Power-ON (or
>     >Â  Â  Â >Â  Â  Â vice-versa)
>     >Â  Â  Â >Â  Â  Â > > the "Slot Control Register" has the "Power Indicator
>     Control"
>     >Â  Â  Â >Â  Â  Â > > set to "Blinking" expressing a "power transition" 
> mode.
>     >Â  Â  Â >Â  Â  Â > >
>     >Â  Â  Â >Â  Â  Â > > Any hotplug operation during the "power transition" 
> mode is
>     not
>     >Â  Â  Â >Â  Â  Â permitted
>     >Â  Â  Â >Â  Â  Â > > or at least not expected by the Guest OS leading to 
> strange
>     >Â  Â  Â failures.
>     >Â  Â  Â >Â  Â  Â > >
>     >Â  Â  Â >Â  Â  Â > > Detect and refuse hotplug operations in such case.
>     >Â  Â  Â >Â  Â  Â > >
>     >Â  Â  Â >Â  Â  Â > > Signed-off-by: Marcel Apfelbaum 
> <marcel.apfelb...@gmail.com
>     >
>     >Â  Â  Â >Â  Â  Â > > ---
>     >Â  Â  Â >Â  Â  Â > >Â  hw/pci/pcie.c | 7 +++++++
>     >Â  Â  Â >Â  Â  Â > >Â  1 file changed, 7 insertions(+)
>     >Â  Â  Â >Â  Â  Â > >
>     >Â  Â  Â >Â  Â  Â > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>     >Â  Â  Â >Â  Â  Â > > index 5b48bae0f6..2fe5c1473f 100644
>     >Â  Â  Â >Â  Â  Â > > --- a/hw/pci/pcie.c
>     >Â  Â  Â >Â  Â  Â > > +++ b/hw/pci/pcie.c
>     >Â  Â  Â >Â  Â  Â > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb
>     (HotplugHandler
>     >Â  Â  Â >Â  Â  Â *hotplug_dev, DeviceState *dev,
>     >Â  Â  Â >Â  Â  Â > >Â  Â  Â  PCIDevice *hotplug_pdev = 
> PCI_DEVICE(hotplug_dev);
>     >Â  Â  Â >Â  Â  Â > >Â  Â  Â  uint8_t *exp_cap = hotplug_pdev->config +
>     hotplug_pdev->
>     >Â  Â  Â >Â  Â  Â exp.exp_cap;
>     >Â  Â  Â >Â  Â  Â > >Â  Â  Â  uint32_t sltcap = pci_get_word(exp_cap +
>     PCI_EXP_SLTCAP);
>     >Â  Â  Â >Â  Â  Â > > +Â  Â  uint32_t sltctl = pci_get_word(exp_cap +
>     PCI_EXP_SLTCTL);
>     >Â  Â  Â >Â  Â  Â > >Â 
>     >Â  Â  Â >Â  Â  Â > >Â  Â  Â  /* Check if hot-plug is disabled on the 
> slot */
>     >Â  Â  Â >Â  Â  Â > >Â  Â  Â  if (dev->hotplugged && (sltcap & 
> PCI_EXP_SLTCAP_HPC) =
>     = 0) {
>     >Â  Â  Â >Â  Â  Â > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb
>     >Â  Â  Â (HotplugHandler
>     >Â  Â  Â >Â  Â  Â *hotplug_dev, DeviceState *dev,
>     >Â  Â  Â >Â  Â  Â > >Â  Â  Â  Â  Â  return;
>     >Â  Â  Â >Â  Â  Â > >Â  Â  Â  }
>     >Â  Â  Â >Â  Â  Â > >Â 
>     >Â  Â  Â >Â  Â  Â > > +Â  Â  if ((sltctl & PCI_EXP_SLTCTL_PIC) ==
>     >Â  Â  Â PCI_EXP_SLTCTL_PWR_IND_BLINK)
>     >Â  Â  Â >Â  Â  Â {
>     >Â  Â  Â >Â  Â  Â > > +Â  Â  Â  Â  error_setg(errp, "Hot-plug failed: %s 
> is in Power
>     >Â  Â  Â Transition",
>     >Â  Â  Â >Â  Â  Â > > +Â  Â  Â  Â  Â  Â  Â  Â  Â  Â 
> DEVICE(hotplug_pdev)->id);
>     >Â  Â  Â >Â  Â  Â > > +Â  Â  Â  Â  return;
>     >Â  Â  Â >Â  Â  Â > > +Â  Â  }
>     >Â  Â  Â >Â  Â  Â > > +
>     >Â  Â  Â >Â  Â  Â > >Â  Â  Â  
> pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev),
>     dev,
>     >Â  Â  Â errp);
>     >Â  Â  Â >Â  Â  Â > >Â  }Â 
>     >Â  Â  Â >Â  Â  Â >
>     >Â  Â  Â >Â  Â  Â > Probably the only way to handle for existing machine 
> types.
>     >Â  Â  Â >
>     >Â  Â  Â >
>     >Â  Â  Â > I agree
>     >Â  Â  Â > Â 
>     >Â  Â  Â >
>     >Â  Â  Â >Â  Â  Â > For new ones, can't we queue it in host memory 
> somewhere?
>     >Â  Â  Â >
>     >Â  Â  Â >
>     >Â  Â  Â >
>     >Â  Â  Â > I am not sure I understand what will be the flow.
>     >Â  Â  Â > Â  - The user asks for a hotplug operation.
>     >Â  Â  Â > Â  -Â  QEMU deferred operation.
>     >Â  Â  Â > After that the operation may still fail, how would the user 
> know if
>     the
>     >Â  Â  Â > operation
>     >Â  Â  Â > succeeded or not?
>     >
>     >
>     >Â  Â  Â How can it fail? It's just a button press ...
>     >
>     >
>     >
>     > Currently we have "Hotplug unsupported."
>     > With this change we have "Guest/System not ready"
> 
> 
>     Hotplug unsupported is not an error that can trigger with
>     a well behaved management such as libvirt.
> 
> 
>     > Â 
>     >
>     >Â  Â  Â > Â 
>     >Â  Â  Â >
>     >Â  Â  Â >Â  Â  Â I'm not actually convinced we can't do that even for 
> existing
>     machine
>     >Â  Â  Â >Â  Â  Â types.Â 
>     >Â  Â  Â >
>     >Â  Â  Â >
>     >Â  Â  Â > Is a Guest visible change, I don't think we can do it.
>     >Â  Â  Â > Â 
>     >Â  Â  Â >
>     >Â  Â  Â >Â  Â  Â So I'm a bit hesitant to suggest going ahead with this 
> without
>     >Â  Â  Â >Â  Â  Â looking a bit closer at whether we can implement a
>     wait-for-ready in
>     >Â  Â  Â >Â  Â  Â qemu, rather than forcing every user of qemu (human or 
> machine)
>     to do
>     >Â  Â  Â >Â  Â  Â so.
>     >Â  Â  Â >
>     >Â  Â  Â >
>     >Â  Â  Â > While I agree it is a pain from the usability point of view,
>     hotplug
>     >Â  Â  Â operations
>     >Â  Â  Â > are allowed to fail. This is not more than a corner case, 
> ensuring
>     the
>     >Â  Â  Â right
>     >Â  Â  Â > response (gracefully erroring out) may be enough.
>     >Â  Â  Â >
>     >Â  Â  Â > Thanks,
>     >Â  Â  Â > Marcel
>     >Â  Â  Â >
>     >
>     >
>     >Â  Â  Â I don't think they ever failed in the past so management is 
> unlikely
>     >Â  Â  Â to handle the failure by retrying ...
>     >
>     >
>     > That would require some management handling, yes.
>     > But even without a "retry", failingÂ is better than strange OS behavior.
>     >
>     > Trying a better alternative like deferring the operation for new 
> machines
>     > would make sense, however is out of the scope of this patch
> 
>     Expand the scope please. The scope should be "solve a problem xx" not
>     "solve a problem xx by doing abc".
> 
> 
> 
> The scope is detecting a hotplug error early instead
> passing to the Guest OS a hotplug operation that we know it will fail.
>


Right. After detecting just failing unconditionally it a bit too
simplistic IMHO.

> 
>     > that simply
>     > detects the error leaving us in a slightly better state than today.
>     >
>     > Thanks,
>     > Marcel
> 
>     Not applying a patch is the only tool we maintainers have to influence
>     people to solve the problem fully.Â 
> 
>     That's why I'm not inclined to apply
>     "slightly better" patches generally.
> 
> 
> 
> The patch is a proposal following some offline discussionsÂ on this matter.
> I personally see the value of it versusÂ what we have today.
> 
> Thanks,
> Marcel
> 
> 
>     >
>     >
>     >Â  Â  Â >
>     >Â  Â  Â >
>     >Â  Â  Â >
>     >Â  Â  Â >Â  Â  Â --
>     >Â  Â  Â >Â  Â  Â David Gibson <dgib...@redhat.com>
>     >Â  Â  Â >Â  Â  Â Principal Software Engineer, Virtualization, Red Hat
>     >Â  Â  Â >
>     >
>     >
> 
>

Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready

Reply via email to