On Wed, Sep 13, 2017 at 05:44:54PM +0100, Mark Cave-Ayland wrote: > On 13/09/17 07:02, David Gibson wrote: > > >>> Alexey - do you recall from your analysis why these fields were no > >>> longer deemed necessary, and how your TCG tests were configured? > >> > >> I most certainly did not do analysis (my bad. sorry) - I took the patch > >> from David as he left the team, fixed to compile and pushed away. I am also > >> very suspicions we did not try migrating TCG or anything but pseries. My > >> guest that things did not break (if they did not which I am not sure about, > >> for the TCG case) because the interrupt controller (XICS) or the > >> pseries-guest took care of resending an interrupt which does not seem to be > >> the case for mac99. > > > > Right, that's probably true. The main point, though, is that these > > fields were dropped a *long* time ago, when migration was barely > > working to begin with. In particular I'm pretty sure most of the > > non-pseries platforms were already pretty broken for migration > > (amongst other things). > > > > Polishing the mac platforms up to working again, including migration, > > is a reasonable goal. But it can't be at the expense of pseries, > > which is already working, used in production, and much better tested > > than mac99 or g3beige ever were. > > Oh I completely agree since I'm well aware pseries likely has more users > than the Mac machines - my question was directed more about why we > support backwards migration. > > I spent several hours yesterday poking my Darwin test case with trying > the different combinations of pending_interrupts, irq_input_state and > access_type and could easily provoke migration failures unless all 3 of > the fields were present so a practical test shows they are still > required for TCG migration. I think ppc_set_irq()'s use of the interrupt > fields in hw/ppc/ppc.c and the subsequent reference to pending > interrupts in target/ppc may explain why I see freezes/hangs until a key > is pressed in many cases.
Ok, I think we need to consider (pending_interrupts and irq_input_state) separately from access_type. The first two are pretty closely related to each other, and I've got at least a rough idea of what the problems there might be. access_type I'm pretty sure has to be an unrelated problem, and I've got much less of a handle on it. I suspect we could work around the problems with pending_interrupts and irq_input_state by having a post_load hook in the board level interrupt controller to reassert its output irq line based on its current state. I believe the relevant irq inputs to the cpu are effectively level triggered, so I think that will be enough. access_type I don't have any good ideas for yet. We really need to work out what the exact race is here that's causing its state to be lost harmfully. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature