On Tue, 1 Nov 2016 21:40:46 +0800 Xiao Guangrong <guangrong.x...@linux.intel.com> wrote:
> On 11/01/2016 06:35 PM, Igor Mammedov wrote: > > On Tue, 1 Nov 2016 11:30:46 +0800 > > Xiao Guangrong <guangrong.x...@linux.intel.com> wrote: > > > >> On 10/31/2016 07:09 PM, Igor Mammedov wrote: > >>> On Mon, 31 Oct 2016 17:52:24 +0800 > >>> Xiao Guangrong <guangrong.x...@linux.intel.com> wrote: > >>> > >>>> On 10/31/2016 05:45 PM, Igor Mammedov wrote: > >>>>> On Sun, 30 Oct 2016 23:25:05 +0200 > >>>>> "Michael S. Tsirkin" <m...@redhat.com> wrote: > >>>>> > >>>>>> From: Xiao Guangrong <guangrong.x...@linux.intel.com> > >>>>>> > >>>>>> The buffer is used to save the FIT info for all the presented nvdimm > >>>>>> devices which is updated after the nvdimm device is plugged or > >>>>>> unplugged. In the later patch, it will be used to construct NVDIMM > >>>>>> ACPI _FIT method which reflects the presented nvdimm devices after > >>>>>> nvdimm hotplug > >>>>>> > >>>>>> As FIT buffer can not completely mapped into guest address space, > >>>>>> OSPM will exit to QEMU multiple times, however, there is the race > >>>>>> condition - FIT may be changed during these multiple exits, so that > >>>>>> some rules are introduced: > >>>>>> 1) the user should hold the @lock to access the buffer and > >>> Could you explain why lock is needed? > >> > >> Yes. These are two things need to be synced between QEMU and OSPM: > >> - fit buffer. QEMU products it and VM will consume it at the same time. > >> - dirty indicator. QEMU sets it and it will be cleared by OSPM, that means. > >> both sides will modify it. > > > > I understand why 'dirty indicator' is necessary but > > could you describe a concrete call flows in detail > > that would cause concurrent access and need extra > > lock protection. > > > > Note that there is global lock (dubbed BQL) in QEMU, > > which is taken every time guest accesses IO port or > > QMP/monitor control event happens. > > Ah. okay. If there is a lock to protect vm-exit handler and > monitor handler, i think it is okay to drop the lock. > > > > >>>>>> 2) mark @dirty whenever the buffer is updated. > >>>>>> > >>>>>> @dirty is cleared for the first time OSPM gets fit buffer, if > >>>>>> dirty is detected in the later access, OSPM will restart the > >>>>>> access > >>>>>> > >>>>>> As fit should be updated after nvdimm device is successfully realized > >>>>>> so that a new hotplug callback, post_hotplug, is introduced > >>>>>> > >>>>>> Signed-off-by: Xiao Guangrong <guangrong.x...@linux.intel.com> > >>>>>> Reviewed-by: Michael S. Tsirkin <m...@redhat.com> > >>>>>> Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > >>>>>> --- > >>>>>> include/hw/hotplug.h | 10 +++++++++ > >>>>>> include/hw/mem/nvdimm.h | 26 +++++++++++++++++++++- > >>>>>> hw/acpi/nvdimm.c | 59 > >>>>>> +++++++++++++++++++++++++++++++++++-------------- > >>>>>> hw/core/hotplug.c | 11 +++++++++ > >>>>>> hw/core/qdev.c | 20 +++++++++++++---- > >>>>>> hw/i386/acpi-build.c | 2 +- > >>>>>> hw/i386/pc.c | 19 ++++++++++++++++ > >>>>>> 7 files changed, 124 insertions(+), 23 deletions(-) > >>>>>> > >>>>>> diff --git a/include/hw/hotplug.h b/include/hw/hotplug.h > >>>>>> index c0db869..10ca5b6 100644 > >>>>>> --- a/include/hw/hotplug.h > >>>>>> +++ b/include/hw/hotplug.h > >>>>>> @@ -47,6 +47,7 @@ typedef void (*hotplug_fn)(HotplugHandler > >>>>>> *plug_handler, > >>>>>> * @parent: Opaque parent interface. > >>>>>> * @pre_plug: pre plug callback called at start of > >>>>>> device.realize(true) > >>>>>> * @plug: plug callback called at end of device.realize(true). > >>>>>> + * @post_pug: post plug callback called after device is successfully > >>>>>> plugged. > >>>>> this doesn't seem to be necessary, > >>>>> why hotplug_handler_plug() isn't sufficient? > >>>> > >>>> At the point of hotplug_handler_plug(), the device is not realize > >>>> (realized == 0), > >>>> however, nvdimm_get_plugged_device_list() works on realized nvdimm > >>>> devices. > >>> > >>> I suggest instead of adding redundant hook, to reuse > >>> hotplug_handler_plug() > >>> which is called after device::realize() successfully completed and amend > >>> nvdimm_plugged_device_list() as follow: > >>> > >>> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c > >>> index e486128..c6d8cbb 100644 > >>> --- a/hw/acpi/nvdimm.c > >>> +++ b/hw/acpi/nvdimm.c > >>> @@ -38,11 +38,7 @@ static int nvdimm_plugged_device_list(Object *obj, > >>> void *opaque) > >>> GSList **list = opaque; > >>> > >>> if (object_dynamic_cast(obj, TYPE_NVDIMM)) { > >>> - DeviceState *dev = DEVICE(obj); > >>> - > >>> - if (dev->realized) { /* only realized NVDIMMs matter */ > >>> - *list = g_slist_append(*list, DEVICE(obj)); > >>> - } > >>> + *list = g_slist_append(*list, obj); > >>> } > >>> > >>> object_child_foreach(obj, nvdimm_plugged_device_list, opaque); > >>> > >>> that should count not only already present nvdimms but also the one that's > >>> being hotplugged. > >> > >> It is not good as the device which failed to initialize > > See device_set_realized() > > [...] > > if (dc->realize) { > > dc->realize(dev, &local_err); > > } > > > > if (local_err != NULL) { > > goto fail; > > } > > > > DEVICE_LISTENER_CALL(realize, Forward, dev); > > > > if (hotplug_ctrl) { > > hotplug_handler_plug(hotplug_ctrl, dev, &local_err); > > } > > > > i.e. control reaches to hotplug_handler_plug() only if > > call dc->realize(dev, &local_err) is successful. > > > > I mean, for example. if there are two devices, the first one is failed to be > realized, the second one is init-ed successfully, then can > nvdimm_plugged_device_list() get these two devices? > > Based the code of pc_dimm_built_list(), i guess yes. nope, read qdev_device_add() end result: - on success device in QOM tree - on failure device is destroyed > > >> or is not being plugged > >> (consider two nvdimm devices directly appended in QEMU command line, when > >> we plug > >> the first one, both nvdimm devices will be counted in this list) is also > >> counted in > >> this list... > > nope, > > qdev_device_add() is called sequentially (one time for each > > -device/device_add) > > so nvdimm_plugged_device_list() sees only devices that's been added > > by previous -device/device_add plus one extra device that's > > being added by in progress qdev_device_add() call. > > > > so for "-device nvdimm,id=nv1 -device nvdimm,id=2" call sequence > > would look like: > > 1: > > qdev_device_add("nvdimm,id=nv1") { > > -> set parent // device becomes visible to > > nvdimm_get_plugged_device_list() > > // this is the only time where device->realized == false > > // could be observed, all other call chains would either > > // see device->realized == true or not see device at all > > -> realize() > > -> device_set_realized() > > -> nvdimm->realize() > > -> hotplug_handler_plug() > > nvdimm_get_plugged_device_list() > > // would return list with 1 item (nv1) > > // where nv1->realized == false) > > > > } > > 2: > > qdev_device_add("nvdimm,id=nv2") { > > -> set parent // device becomes visible to > > nvdimm_get_plugged_device_list() > > -> realize() > > -> device_set_realized() > > -> nvdimm->realize() > > -> hotplug_handler_plug() > > nvdimm_get_plugged_device_list() > > // would return list with 2 items (nv1 and nv2) > > // where nv1->realized == true and nv2->realized = > > false > > } > > > > Okay. >