On Fri, Oct 23, 2020 at 09:47:14AM +0300, Marcel Apfelbaum wrote: > Hi David, > > On Fri, Oct 23, 2020 at 6:49 AM David Gibson <dgib...@redhat.com> wrote: > > On Thu, 22 Oct 2020 11:01:04 -0400 > "Michael S. Tsirkin" <m...@redhat.com> wrote: > > > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote: > > [...] > > > > Right. After detecting just failing unconditionally it a bit too > > simplistic IMHO. > > There's also another factor here, which I thought I'd mentioned > already, but looks like I didn't: I think we're still missing some > details in what's going on. > > The premise for this patch is that plugging while the indicator is in > transition state is allowed to fail in any way on the guest side. I > don't think that's a reasonable interpretation, because it's unworkable > for physical hotplug. If the indicator starts blinking while you're in > the middle of shoving a card in, you'd be in trouble. > > So, what I'm assuming here is that while "don't plug while blinking" is > the instruction for the operator to obey as best they can, on the guest > side the rule has to be "start blinking, wait a while and by the time > you leave blinking state again, you can be confident any plugs or > unplugs have completed". Obviously still racy in the strict computer > science sense, but about the best you can do with slow humans in the > mix. > > So, qemu should of course endeavour to follow that rule as though it > was a human operator on a physical machine and not plug when the > indicator is blinking. *But* the qemu plug will in practice be fast > enough that if we're hitting real problems here, it suggests the guest > is still doing something wrong. > > > I personally think there is a little bit of over-engineering here. > Let's start with the spec: > >   Power Indicator Blinking >   A blinking Power Indicator indicates that the slot is powering up or > powering down and that >   insertion or removal of the adapter is not permitted. > > What exactly is an interpretation here? > As you stated, the races are theoretical, the whole point of the indicator > is to let the operator know he can't plug the device just yet. > > I understand it would be more user friendly if the QEMU would wait internally > for the > blinking to end, but the whole point of the indicator is to let the operator > (human or machine) > know they can't plug the device at a specific time. > Should QEMU take the responsibility of the operator? Is it even correct? > > Even if we would want such a feature, how is it related to this patch? > The patch simply refuses to start a hotplug operation when it knows it will > not > succeed. >  > Another way that would make sense to me would be is a new QEMU interface > other > than > "add_device", let's say "adding_device_allowed", that would return true if the > hotplug is allowed > at this point of time. (I am aware of the theoretical races)Â
Rather than adding_device_allowed, something like "query slot" might be helpful for debugging. That would help user figure out e.g. why isn't device visible without any races. > The above will at least mimic the mechanics of the pyhs world. The operator > looks at the indicator, > the management software checks if adding the device is allowed. > Since it is a corner case I would prefer the device_add to fail rather than > introducing a new interface, > but that's just me. > > Thanks, > Marcel > I think we want QEMU management interface to be reasonably abstract and agnostic if possible. Pushing knowledge of hardware detail to management will just lead to pain IMHO. We supported device_add which practically never fails for years, at this point it's easier to keep supporting it than change all users ... > > -- > David Gibson <dgib...@redhat.com> > Principal Software Engineer, Virtualization, Red Hat >