On 02/10/2019 16:21, Greg Kurz wrote: > On Wed, 2 Oct 2019 11:02:45 +1000 > David Gibson <da...@gibson.dropbear.id.au> wrote: > >> On Tue, Oct 01, 2019 at 06:56:28PM +0200, Greg Kurz wrote: >>> On Tue, 1 Oct 2019 13:56:10 +0200 >>> Cédric Le Goater <c...@kaod.org> wrote: >>> >>>> On 01/10/2019 13:06, Greg Kurz wrote: >>>>> On Tue, 1 Oct 2019 10:57:22 +0200 >>>>> Cédric Le Goater <c...@kaod.org> wrote: >>>>> >>>>>> When vCPUs are hotplugged, they are added to the QEMU CPU list before >>>>>> being fully realized. This can crash the XIVE presenter because the >>>>>> 'tctx' pointer is not necessarily initialized when looking for a >>>>>> matching target. >>>>>> >>>>> >>>>> Ouch... :-\ >>>>> >>>>>> These vCPUs are not valid targets for the presenter. Skip them. >>>>>> >>>>> >>>>> This likely fixes this specific issue, but more generally, this >>>>> seems to indicate that using CPU_FOREACH() is rather fragile. >>>>> >>>>> What about tracking XIVE TM contexts with a QLIST in xive.c ? >>>> >>>> This is a good idea. >>>> >>>> On HW, the thread interrupt contexts belong to the XIVE presenter >>>> subengine. This is the logic doing the CAM line matching to find >>>> a target for an event notification. So we should model the context >>>> list below the XiveRouter in QEMU which models both router and >>>> presenter subengines. We have done without a presenter model for >>>> the moment and I don't think we will need to introduce one. >>>> >>>> This would be a nice improvements of my patchset adding support >>>> for xive escalations and better support of multi chip systems. >>>> I have introduced a PNV_CHIP_CPU_FOREACH in this patchset which >>>> would become useless with a list of tctx under the XIVE interrupt >>>> controller, XiveRouter, SpaprXive, PnvXive. >>>> >>> >>> I agree. It makes more sense to have the list below the XiveRouter, >>> rather than relying on CPU_FOREACH(), which looks a bit weird from >>> a device emulation code perspective. >> >> That does sound like a better idea long term. However, for now, I >> think the NULL check is a reasonable way of fixing the real error >> we're hitting, so I've applied the patch here. >> > > Fair enough. > > Reviewed-by: Greg Kurz <gr...@kaod.org> > >> Future cleanups to a different approach remain welcome, of course. >> > > I've started to look. This should simplify Cedric's "add XIVE support > for KVM guests" series, but I'll wait for your massive cleanup series > to get merged first.
This is a combo patchset. These are prereqs fixes related to the presenter and how CAM lines are scanned. This is in direct relation with the CPU_FOREACH() issue. ppc/xive: Introduce a XivePresenter interface ppc/xive: Implement the XivePresenter interface ppc/pnv: Introduce a PNV_CHIP_CPU_FOREACH() helper ppc/pnv: Introduce a pnv_xive_is_cpu_enabled() helper ppc/xive: Introduce a XiveFabric interface ppc/pnv: Implement the XiveFabric interface ppc/spapr: Implement the XiveFabric interface ppc/xive: Use the XiveFabric and XivePresenter interfaces These are for interrupt resend (escalation). To be noted, the removal of the get_tctx() XiveRouter handler which has some relation with the previous subserie. ppc/xive: Extend the TIMA operation with a XivePresenter parameter ppc/pnv: Clarify how the TIMA is accessed on a multichip system ppc/xive: Move the TIMA operations to the controller model ppc/xive: Remove the get_tctx() XiveRouter handler ppc/xive: Introduce a xive_tctx_ipb_update() helper ppc/xive: Introduce helpers for the NVT id ppc/xive: Synthesize interrupt from the saved IPB in the NVT This is a bug : ppc/pnv: Remove pnv_xive_vst_size() routine ppc/pnv: Dump the XIVE NVT table ppc/pnv: Skip empty slots of the XIVE NVT table This is a model for the block id configuration and better support for multichip systems : ppc/pnv: Introduce a pnv_xive_block_id() helper ppc/pnv: Extend XiveRouter with a get_block_id() handler Misc improvements : ppc/pnv: Quiesce some XIVE errors ppc/xive: Introduce a xive_os_cam_decode() helper ppc/xive: Check V bit in TM_PULL_POOL_CTX ppc/pnv: Improve trigger data definition ppc/pnv: Use the EAS trigger bit when triggering an interrupt from PSI I should move some in front to have them merged before hand if possible. They have been on the list for 3 months. >>>> Next step would be to get rid of the tctx->cs pointer. In my latest >>>> patches, it is only used to calculate the HW CAM line. >>>> >>>> There might be some consequences on the object hierarchy and it will >>>> break migration. >>>> >>> >>> This could break if the contexts were devices sitting in a bus, which >>> isn't the case here. I'll try to come up with a proposal for spapr, >>> and we can work out the changes on your recent XIVE series for pnv. >>> >>>> Thanks, >>>> >>>> C. >>>> >>>>> >>>>> ================================================================================ >>>>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h >>>>> index 6d38755f8459..89b9ef7f20b1 100644 >>>>> --- a/include/hw/ppc/xive.h >>>>> +++ b/include/hw/ppc/xive.h >>>>> @@ -319,6 +319,8 @@ typedef struct XiveTCTX { >>>>> qemu_irq os_output; >>>>> >>>>> uint8_t regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE]; >>>>> + >>>>> + QTAILQ_ENTRY(XiveTCTX) next; >>>>> } XiveTCTX; >>>>> >>>>> /* >>>>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c >>>>> index b7417210d817..f7721c711041 100644 >>>>> --- a/hw/intc/xive.c >>>>> +++ b/hw/intc/xive.c >>>>> @@ -568,6 +568,8 @@ static void xive_tctx_reset(void *dev) >>>>> ipb_to_pipr(tctx->regs[TM_QW3_HV_PHYS + TM_IPB]); >>>>> } >>>>> >>>>> +static QTAILQ_HEAD(, XiveTCTX) xive_tctx_list = >>>>> QTAILQ_HEAD_INITIALIZER(xive_tctx_list); >>>>> + >>>>> static void xive_tctx_realize(DeviceState *dev, Error **errp) >>>>> { >>>>> XiveTCTX *tctx = XIVE_TCTX(dev); >>>>> @@ -609,10 +611,14 @@ static void xive_tctx_realize(DeviceState *dev, >>>>> Error **errp) >>>>> } >>>>> >>>>> qemu_register_reset(xive_tctx_reset, dev); >>>>> + QTAILQ_INSERT_HEAD(&xive_tctx_list, tctx, next); >>>>> } >>>>> >>>>> static void xive_tctx_unrealize(DeviceState *dev, Error **errp) >>>>> { >>>>> + XiveTCTX *tctx = XIVE_TCTX(dev); >>>>> + >>>>> + QTAILQ_REMOVE(&xive_tctx_list, tctx, next); >>>>> qemu_unregister_reset(xive_tctx_reset, dev); >>>>> } >>>>> >>>>> @@ -1385,15 +1391,14 @@ static bool xive_presenter_match(XiveRouter >>>>> *xrtr, uint8_t format, >>>>> bool cam_ignore, uint8_t priority, >>>>> uint32_t logic_serv, XiveTCTXMatch >>>>> *match) >>>>> { >>>>> - CPUState *cs; >>>>> + XiveTCTX *tctx; >>>>> >>>>> /* >>>>> * TODO (PowerNV): handle chip_id overwrite of block field for >>>>> * hardwired CAM compares >>>>> */ >>>>> >>>>> - CPU_FOREACH(cs) { >>>>> - XiveTCTX *tctx = xive_router_get_tctx(xrtr, cs); >>>>> + QTAILQ_FOREACH(tctx, &xive_tctx_list, next) { >>>>> int ring; >>>>> >>>>> /* >>>>> ================================================================================ >>>>> >>>>>> Signed-off-by: Cédric Le Goater <c...@kaod.org> >>>>>> --- >>>>>> hw/intc/xive.c | 8 ++++++++ >>>>>> 1 file changed, 8 insertions(+) >>>>>> >>>>>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c >>>>>> index b7417210d817..29df06df1136 100644 >>>>>> --- a/hw/intc/xive.c >>>>>> +++ b/hw/intc/xive.c >>>>>> @@ -1396,6 +1396,14 @@ static bool xive_presenter_match(XiveRouter >>>>>> *xrtr, uint8_t format, >>>>>> XiveTCTX *tctx = xive_router_get_tctx(xrtr, cs); >>>>>> int ring; >>>>>> >>>>>> + /* >>>>>> + * Skip partially initialized vCPUs. This can happen when >>>>>> + * vCPUs are hotplugged. >>>>>> + */ >>>>>> + if (!tctx) { >>>>>> + continue; >>>>>> + } >>>>>> + >>>>>> /* >>>>>> * HW checks that the CPU is enabled in the Physical Thread >>>>>> * Enable Register (PTER). >>>>> >>>> >>> >> >