On Fri, 15 Jun 2018 22:32:44 +1000 David Gibson <da...@gibson.dropbear.id.au> wrote:
> On Fri, Jun 15, 2018 at 10:01:47AM +0200, Greg Kurz wrote: > > On Fri, 15 Jun 2018 09:07:24 +0200 > > Greg Kurz <gr...@kaod.org> wrote: > > > > > On Fri, 15 Jun 2018 16:29:15 +1000 > > > David Gibson <da...@gibson.dropbear.id.au> wrote: > > > > > > > On Fri, Jun 15, 2018 at 07:58:05AM +0200, Greg Kurz wrote: > > > > > On Fri, 15 Jun 2018 10:14:31 +1000 > > > > > David Gibson <da...@gibson.dropbear.id.au> wrote: > > > > > > > > > > > On Fri, Jun 15, 2018 at 10:02:25AM +1000, David Gibson wrote: > > > > > > > On Thu, Jun 14, 2018 at 11:50:42PM +0200, Greg Kurz wrote: > > > > > > > > The spapr_realize_vcpu() function doesn't rollback in case of > > > > > > > > error. > > > > > > > > This isn't a problem with coldplugged CPUs because the machine > > > > > > > > won't > > > > > > > > start and QEMU will exit. Hotplug is a different story though: > > > > > > > > the > > > > > > > > CPU thread is started under object_property_set_bool() and it > > > > > > > > assumes > > > > > > > > it can access the CPU object. > > > > > > > > > > > > > > > > If icp_create() fails, we return an error without unregistering > > > > > > > > the > > > > > > > > reset handler for this CPU, and we let the underlying QEMU > > > > > > > > thread for > > > > > > > > this CPU alive. Since spapr_cpu_core_realize() doesn't care to > > > > > > > > unrealize > > > > > > > > already realized CPUs either, but happily frees all of them > > > > > > > > anyway, the > > > > > > > > CPU thread crashes instantly: > > > > > > > > > > > > > > > > (qemu) device_add host-spapr-cpu-core,core-id=1,id=gku > > > > > > > > GKU: failing icp_create (cpu 0x11497fd0) > > > > > > > > ^^^^^^^^^^ > > > > > > > > Program received signal SIGSEGV, Segmentation fault. > > > > > > > > [Switching to Thread 0x7fffee3feaa0 (LWP 24725)] > > > > > > > > 0x00000000104c8374 in object_dynamic_cast_assert > > > > > > > > (obj=0x11497fd0, > > > > > > > > ^^^^^^^^^^^^^^ > > > > > > > > pointer to the CPU > > > > > > > > object > > > > > > > > 623 trace_object_dynamic_cast_assert(obj ? > > > > > > > > obj->class->type->name > > > > > > > > (gdb) p obj->class->type > > > > > > > > $1 = (Type) 0x0 > > > > > > > > (gdb) p * obj > > > > > > > > $2 = {class = 0x10ea9c10, free = 0x11244620, > > > > > > > > ^^^^^^^^^^ > > > > > > > > should be g_free > > > > > > > > (gdb) p g_free > > > > > > > > $3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free> > > > > > > > > > > > > > > > > obj is a dangling pointer to the CPU that was just destroyed in > > > > > > > > spapr_cpu_core_realize(). > > > > > > > > > > > > > > > > This patch adds proper rollback to both spapr_realize_vcpu() and > > > > > > > > spapr_cpu_core_realize(). > > > > > > > > > > > > > > > > Signed-off-by: Greg Kurz <gr...@kaod.org> > > > > > > > > > > > > > > Applied to ppc-for-3.0, since it definitely looks to fix some > > > > > > > problems. > > > > > > > > > > > > Uh.. actually it has a definite bug - the first exit point will call > > > > > > g_free() on an uninitialized spapr_cpu. I fixed it up with a NULL > > > > > > initialization in my tree. > > > > > > > > > > Ah... as said in the cover letter, all the series is based on > > > > > machine_data > > > > > being set before the call to object_property_set_bool()... Maybe I > > > > > should > > > > > have made that explicit with a preparatory patch... Sorry. > > > > > > > > Ah, that makes sense. > > > > > > > > So, I ended up having to rework a little differently, after I yanked > > > > by intc -> machine_data patch because it broke things for clg. I > > > > think I've fixed it up correctly now - if you can check the latest > > > > ppc-for-3.0 I pushed out, that would be great. > > > > > > > > > > I'll do this ASAP. > > > > Oops, I've just spotted a nit in my original patch, that causes > > QEMU to crash if threads > 1... but I had only tested with single > > threaded cores :) > > > > > > +err_unrealize: > > > + while (--j >= 0) { > > > + spapr_unrealize_vcpu(sc->threads[i]); > > ^^^ > > should be j > > Ah, yes. I've fixed that up in my tree. > + spapr_unrealize_vcpu(sc->threads[j); Almost fixed ;) > > > > > Appart from that, it looks good. > > >
pgpa1dRbY6aE2.pgp
Description: OpenPGP digital signature