Re: [Qemu-devel] [Qemu-ppc] [PATCH for-2.11 v2] hw/ppc: CAS reset on early device hotplug

Daniel Henrique Barboza Tue, 29 Aug 2017 14:08:23 -0700


On 08/29/2017 04:23 AM, David Gibson wrote:

On Fri, Aug 25, 2017 at 06:11:18PM -0300, Daniel Henrique Barboza wrote:

v2:
- rebased with ppc-for-2.11
- function 'spapr_cas_completed' dropped
- function 'spapr_drc_needed' made public and it's now used inside
   'spapr_hotplugged_dev_before_cas'
- 'spapr_drc_needed' was changed to support the migration of logical
   DRCs with devs attached in UNUSED state
- new function: 'spapr_clear_pending_events'. This function is used
   inside ppc_spapr_reset to reset the pending_events QTAILQ

Thanks for the followup, unfortunately there is still an important bug
left, see comments on the patch itself.

At a higher level, though, looking at the event reset code made me
think of a possible even simpler solution to this problem.

The queue of events (both hotplug and epow) is already in a simple
internal form that's independent of the two delivery mechanisms.  The
only difference is what event source triggers the interrupt.  This
explains why an extra hotplug event after the CAS "unstuck" the queue.

AFAICT, a spurious interrupts here should be harmless - the kernel
will just check the queue and find nothing there.

So, it should be sufficient to, after CAS, pulse the hotplug queue
interrupt if the hotplug queue is negotiated.

This is something I've tried in my first attempts at this problem, before
sending the first patch in which I blocked hotplug before CAS. Back then,
the problem was that the kernel panics with sig 11 (acess of bad area) when
receiving the pulse after CAS.

I've investigated it a bit today and it seems that it still the case.Firing an IRQ rightafter CAS breaks the kernel. In fact, if you time a regular CPU hotplugright afterCAS you'll get the same sig 11 kernel ooops. It looks like there is atime window after

CAS that the kernel can't handle the hotplug process and pulsing the hotplug

queue in this window breaks the guest. I've tried some hacks such aspulsing the queuein the first 'event_scan' call made by the guest, but apparently it isstill too early.

I've sent an email to the linuxppc-dev mailing list talking about thisbehaviorand asking if there is a reliable way to know when we can safely pulsethe hotplugqueue. Meanwhile, I'll keep working in the v3 respin of this patch incase this

solution of pulsing the hotplug queue ends up being not feasible.


Thanks,


Daniel

Re: [Qemu-devel] [Qemu-ppc] [PATCH for-2.11 v2] hw/ppc: CAS reset on early device hotplug

Reply via email to