On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote: > > > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <m...@redhat.com> wrote: > > On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote: > > Hi David, Michael, > > > > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgib...@redhat.com> wrote: > > > >Â Â Â On Thu, 22 Oct 2020 08:06:55 -0400 > >Â Â Â "Michael S. Tsirkin" <m...@redhat.com> wrote: > > > >Â Â Â > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum > wrote: > >Â Â Â > > From: Marcel Apfelbaum <mar...@redhat.com> > >Â Â Â > > > >Â Â Â > > During PCIe Root Port's transition from Power-Off to > Power-ON (or > >Â Â Â vice-versa) > >Â Â Â > > the "Slot Control Register" has the "Power Indicator Control" > >Â Â Â > > set to "Blinking" expressing a "power transition" mode. > >Â Â Â > > > >Â Â Â > > Any hotplug operation during the "power transition" mode is > not > >Â Â Â permitted > >Â Â Â > > or at least not expected by the Guest OS leading to strange > failures. > >Â Â Â > > > >Â Â Â > > Detect and refuse hotplug operations in such case. > >Â Â Â > > > >Â Â Â > > Signed-off-by: Marcel Apfelbaum <marcel.apfelb...@gmail.com> > >Â Â Â > > --- > >Â Â Â > >Â hw/pci/pcie.c | 7 +++++++ > >Â Â Â > >Â 1 file changed, 7 insertions(+) > >Â Â Â > > > >Â Â Â > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > >Â Â Â > > index 5b48bae0f6..2fe5c1473f 100644 > >Â Â Â > > --- a/hw/pci/pcie.c > >Â Â Â > > +++ b/hw/pci/pcie.c > >Â Â Â > > @@ -410,6 +410,7 @@ void > pcie_cap_slot_pre_plug_cb(HotplugHandler > >Â Â Â *hotplug_dev, DeviceState *dev, > >Â Â Â > >Â Â Â PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > >Â Â Â > >Â Â Â uint8_t *exp_cap = hotplug_pdev->config + > hotplug_pdev-> > >Â Â Â exp.exp_cap; > >Â Â Â > >Â Â Â uint32_t sltcap = pci_get_word(exp_cap + > PCI_EXP_SLTCAP); > >Â Â Â > > +Â Â uint32_t sltctl = pci_get_word(exp_cap + > PCI_EXP_SLTCTL); > >Â Â Â > >Â > >Â Â Â > >Â Â Â /* Check if hot-plug is disabled on the slot */ > >Â Â Â > >Â Â Â if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) > == 0) { > >Â Â Â > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb > (HotplugHandler > >Â Â Â *hotplug_dev, DeviceState *dev, > >Â Â Â > >Â Â Â Â Â return; > >Â Â Â > >Â Â Â } > >Â Â Â > >Â > >Â Â Â > > +Â Â if ((sltctl & PCI_EXP_SLTCTL_PIC) == > PCI_EXP_SLTCTL_PWR_IND_BLINK) > >Â Â Â { > >Â Â Â > > +Â Â Â Â error_setg(errp, "Hot-plug failed: %s is in > Power > Transition", > >Â Â Â > > +Â Â Â Â Â Â Â Â Â Â DEVICE(hotplug_pdev)->id); > >Â Â Â > > +Â Â Â Â return; > >Â Â Â > > +Â Â } > >Â Â Â > > + > >Â Â Â > >Â Â Â pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), > dev, > errp); > >Â Â Â > >Â }Â > >Â Â Â > > >Â Â Â > Probably the only way to handle for existing machine types. > > > > > > I agree > > Â > > > >Â Â Â > For new ones, can't we queue it in host memory somewhere? > > > > > > > > I am not sure I understand what will be the flow. > > Â - The user asks for a hotplug operation. > > Â -Â QEMU deferred operation. > > After that the operation may still fail, how would the user know if the > > operation > > succeeded or not? > > > How can it fail? It's just a button press ... > > > > Currently we have "Hotplug unsupported." > With this change we have "Guest/System not ready"
Hotplug unsupported is not an error that can trigger with a well behaved management such as libvirt. >  > > >  > > > >   I'm not actually convinced we can't do that even for existing > machine > >   types. > > > > > > Is a Guest visible change, I don't think we can do it. > >  > > > >   So I'm a bit hesitant to suggest going ahead with this without > >   looking a bit closer at whether we can implement a > wait-for-ready in > >   qemu, rather than forcing every user of qemu (human or machine) > to do > >   so. > > > > > > While I agree it is a pain from the usability point of view, hotplug > operations > > are allowed to fail. This is not more than a corner case, ensuring the > right > > response (gracefully erroring out) may be enough. > > > > Thanks, > > Marcel > > > > > I don't think they ever failed in the past so management is unlikely > to handle the failure by retrying ... > > > That would require some management handling, yes. > But even without a "retry", failing is better than strange OS behavior. > > Trying a better alternative like deferring the operation for new machines > would make sense, however is out of the scope of this patch Expand the scope please. The scope should be "solve a problem xx" not "solve a problem xx by doing abc". > that simply > detects the error leaving us in a slightly better state than today. > > Thanks, > Marcel Not applying a patch is the only tool we maintainers have to influence people to solve the problem fully. That's why I'm not inclined to apply "slightly better" patches generally. > > > > > > > > > >   -- > >   David Gibson <dgib...@redhat.com> > >   Principal Software Engineer, Virtualization, Red Hat > > > >