Am 11.03.2015 um 12:11 schrieb Eduardo Habkost: > On Tue, Mar 10, 2015 at 11:43:41PM +0100, Andreas Färber wrote: >> Am 10.03.2015 um 22:57 schrieb Eduardo Habkost: >>> Instead of passing icc_bridge from the PC initialization code to >>> cpu_x86_create(), make the PC initialization code attach the CPU to >>> icc_bridge. >>> >>> The only difference here is that icc_bridge attachment will now be done >>> after x86_cpu_parse_featurestr() is called. But this shouldn't make any >>> difference, as property setters shouldn't depend on icc_bridge. >>> >>> Signed-off-by: Eduardo Habkost <ehabk...@redhat.com> >>> --- >>> Changes v1 -> v2: >>> * Keep existing check for NULL icc_bridge and error reporting, instead >>> of assing assert(icc_bridge) >>> --- >>> hw/i386/pc.c | 13 +++++++++++-- >>> target-i386/cpu.c | 14 ++------------ >>> target-i386/cpu.h | 3 +-- >>> 3 files changed, 14 insertions(+), 16 deletions(-) >>> >>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c >>> index b5b2aad..a26e0ec 100644 >>> --- a/hw/i386/pc.c >>> +++ b/hw/i386/pc.c >>> @@ -992,18 +992,27 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int >>> level) >>> static X86CPU *pc_new_cpu(const char *cpu_model, int64_t apic_id, >>> DeviceState *icc_bridge, Error **errp) >>> { >>> - X86CPU *cpu; >>> + X86CPU *cpu = NULL; >>> Error *local_err = NULL; >>> >>> - cpu = cpu_x86_create(cpu_model, icc_bridge, &local_err); >>> + if (icc_bridge == NULL) { >>> + error_setg(&local_err, "Invalid icc-bridge value"); >>> + goto out; >>> + } >>> + >>> + cpu = cpu_x86_create(cpu_model, &local_err); >> >> We had previously discussed reference counting. Here I would expect: > > I will try to extend the analysis with ownership of each reference: > >> >> OBJECT(cpu)->ref == 1 > > And the owner of the reference is pc_new_cpu() (cpu variable). > >> >>> if (local_err != NULL) { >>> error_propagate(errp, local_err); >>> return NULL; >>> } >>> >>> + qdev_set_parent_bus(DEVICE(cpu), qdev_get_child_bus(icc_bridge, >>> "icc")); >> >> OBJECT(cpu)->ref == 2 > > And the owners are: pc_new_cpu()/cpu and icc_bridge. > > Now, what if we error out and destroy the CPU after we already added the > CPU to icc_bridge? Is icc_bridge going to keep pointing to the dead > object, or is there some bus-detach magic somewhere that will make it > work?
Lowering the ref count to 0 will trigger object_finalize(), which via object_property_del_all() triggers unparenting via object_finalize_child_property(), which in the device case causes unrealization of the device and unparenting of its owned buses (qdev.c:device_unparent()). >>> + object_unref(OBJECT(cpu)); >> >> OBJECT(cpu)->ref == 1 > > Here pc_new_cpu() is dropping its reference too early! In practice it is > now borrowing the reference owned by icc_bridge, and I don't think we > should do that. > > I just kept the object_unref() call here because I didn't want to change > any behavior when moving code, but I think it doesn't belong here. Agree. In my patches I placed it after the last usage of cpu in this function, but actually since it is the return value it would even need to go into the callers. pc_cpus_init() is currently ignoring the return value. For using the CPU as a child<> of a CPU core object that changes in my series, so I'll fix that. >>> + >>> object_property_set_int(OBJECT(cpu), apic_id, "apic-id", &local_err); >>> object_property_set_bool(OBJECT(cpu), true, "realized", &local_err); >> >> OBJECT(cpu)->ref == 1 or 2 depending on DeviceClass::realize :) > > If it's 2, it won't be our problem as we don't own the extra reference. > It's the responsibility of whoever grabbed the extra reference. > > But I assume the property setters above MUST not add any extra reference > in case they return an error. Correct? So, the extra reference would be the one added by device_set_realized() for /machine/unattached parent. That happens as very first thing on false -> true transition. If an error occurs either in DeviceClass::realize or later in device_set_realized() then that reference will remain, but it is re-entrant in only doing so once. I have refreshing and combining the two competing qom-tree HMP patches on my radar and am starting to think that reference counting information may be a third interesting mode of output... >>> >>> +out: >>> if (local_err) { >>> error_propagate(errp, local_err); >>> object_unref(OBJECT(cpu)); >> > > And here we have something that was already broken: X86CPU instance_init > calls cpu_exec_init(), the CPU is added to the global CPU list without > increasing reference counting, and the global list will point to a > destroyed object if we enter the error path. > > In other words, if anything fails after cpu_exec_init() is called, we're > screwed. In the future it will be on realize, but we probably need to > move it closer to the end of realize. Yeah, most of the error handling kind of assumes that when an error occurs we report the Error* upwards and exit QEMU, with the user restarting from scratch. With CPU hotplug that is a nasty assumption. >> object_unref(NULL) looks unusual but is valid. > > Yes. Makes the code simpler. :) > >> >> Should we change the return NULL to jump here, too, then? > > Yes, the cpu_x86_create() error check could just do a "goto out". I assume you are planning to apply this patch through your tree? Will you submit a v3? Otherwise can you tweak when applying and let me know so I can cherry-pick? Thanks. >> OBJECT(cpu)->ref == 0 or 1 >> >> I wonder whether we need another object_unref(OBJECT(cpu)) for the >> non-error case, either here or in the callers? Out of scope for this >> patch, of course. > > So, how I see it: if we are returning a reference to the object, now it > belongs to the caller, and it should be the caller responsibility to > call object_unref(). So the non-error object_unref() after > qdev_set_parent_bus() is not supposed to be in pc_new_cpus(), but in the > callers. Yeah, agree. > Either way we choose, we should document it so callers know who > owns the reference they get. > > Alternatively, we could simply change pc_new_cpu() to _not_ return a > pointer to the CPU, and make pc_cpus_init() deal with the APIC MMIO > mapping using some another approach. No, that won't work for me. But my question was rather whether changes for the error case will be needed? If the reference for /machine/unassigned/device[n] remains around, then our local object_unref() won't unparent/finalize the device, meaning it stays around. If we do an additional unref then unparenting would make the ref count go to -1. So we may need to unparent it explicitly here in this error path? Hard to test. Regards, Andreas -- SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton; HRB 21284 (AG Nürnberg)