> -----Original Message-----
> From: Zhang, Qi Z
> Sent: Friday, November 23, 2018 8:11 PM
> To: Stojaczyk, Dariusz <dariusz.stojac...@intel.com>; dev@dpdk.org
> Cc: tho...@monjalon.net
> Subject: RE: [PATCH] dev: fix attach rollback of a device that was already
> attached
>
>
>
> > -----Original Message-----
> > From: Stojaczyk, Dariusz
> > Sent: Friday, November 23, 2018 6:45 AM
> > To: dev@dpdk.org
> > Cc: tho...@monjalon.net; Stojaczyk, Dariusz
> <dariusz.stojac...@intel.com>;
> > Zhang, Qi Z <qi.z.zh...@intel.com>
> > Subject: [PATCH] dev: fix attach rollback of a device that was already
> attached
> >
> > When primary process receives an IPC attach request of a device that's
> already
> > locally-attached, it doesn't setup its variables properly and is prone to
> segfaulting
> > on a subsequent rollback.
> >
> > `ret = local_dev_probe(req->devargs, &dev)`
> >
> > The above function will set `dev` pointer to the proper device *unless* it
> returns
> > with error. One of those errors is -EEXIST, which the hotplug function
> explicitly
> > ignores. For -EEXIST, it proceeds with attaching the device and expects the
> dev
> > pointer to be valid.
>
> Good capture.
> >
> > Despite this patch being a fix, it also introduces a design decision - when
> any
> > secondary process fails to attach a device, the primary process that already
> had
> > the device attached won't attempt to detach that device locally as a part of
> the
> > rollback routine.
> > Primary process would have already printed a message "Failed to [...] on
> > secondary" and now it will also print a warning "Devices may not be in sync
> [...]".
>
> A little bit concern for this.
> we may try to avoid the abnormal situation that device is not synced.
> The scenario you describe actually is start from an abnormal situation due to
> some previous error.
> so is it better to always take chance to end up with a normal situation.
>
> It looks better for me if we can fixed it in local_dev_probe to return a valid
> device with -EEXIST.
Actually that was my original idea, but I gave it up in the end.
Ok, I'll do that in V2.
Thanks,
D.
>
> >
> > Fixes: ac9e4a17370f ("eal: support attach/detach shared device from
> > secondary")
> > Cc: qi.z.zh...@intel.com
> >
> > Signed-off-by: Darek Stojaczyk <dariusz.stojac...@intel.com>
> > ---
> > lib/librte_eal/common/hotplug_mp.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/hotplug_mp.c
> > b/lib/librte_eal/common/hotplug_mp.c
> > index 7c9fcc46c..7ee074a31 100644
> > --- a/lib/librte_eal/common/hotplug_mp.c
> > +++ b/lib/librte_eal/common/hotplug_mp.c
> > @@ -88,7 +88,7 @@ __handle_secondary_request(void *param)
> > (const struct eal_dev_mp_req *)msg->param;
> > struct eal_dev_mp_req tmp_req;
> > struct rte_devargs *da;
> > - struct rte_device *dev;
> > + struct rte_device *dev = NULL;
> > struct rte_bus *bus;
> > int ret = 0;
> >
> > @@ -168,7 +168,15 @@ __handle_secondary_request(void *param)
> > if (req->t == EAL_DEV_REQ_TYPE_ATTACH) {
> > tmp_req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
> > eal_dev_hotplug_request_to_secondary(&tmp_req);
> > - local_dev_remove(dev);
> > + if (dev == NULL) {
> > + /* device was already attached at the time we got
> the
> > + * request, don't detach it now.
> > + */
> > + RTE_LOG(WARNING, EAL,
> > + "Devices in secondary may not sync with
> primary\n");
> > + } else {
> > + local_dev_remove(dev);
> > + }
> > } else {
> > tmp_req.t = EAL_DEV_REQ_TYPE_DETACH_ROLLBACK;
> > eal_dev_hotplug_request_to_secondary(&tmp_req);
> > --
> > 2.17.1