When primary process receives an IPC attach request
of a device that's already locally-attached, it
doesn't setup its variables properly and is prone to
segfaulting on a subsequent rollback.

`ret = local_dev_probe(req->devargs, &dev)`

The above function will set `dev` pointer to the
proper device *unless* it returns with error. One of
those errors is -EEXIST, which the hotplug function
explicitly ignores. For -EEXIST, it proceeds with
attaching the device and expects the dev pointer to
be valid.

Despite this patch being a fix, it also introduces
a design decision - when any secondary process fails
to attach a device, the primary process that already
had the device attached won't attempt to detach that
device locally as a part of the rollback routine.
Primary process would have already printed a message
"Failed to [...] on secondary" and now it will also
print a warning "Devices may not be in sync [...]".

Fixes: ac9e4a17370f ("eal: support attach/detach shared device from secondary")
Cc: qi.z.zh...@intel.com

Signed-off-by: Darek Stojaczyk <dariusz.stojac...@intel.com>
---
 lib/librte_eal/common/hotplug_mp.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/hotplug_mp.c 
b/lib/librte_eal/common/hotplug_mp.c
index 7c9fcc46c..7ee074a31 100644
--- a/lib/librte_eal/common/hotplug_mp.c
+++ b/lib/librte_eal/common/hotplug_mp.c
@@ -88,7 +88,7 @@ __handle_secondary_request(void *param)
                (const struct eal_dev_mp_req *)msg->param;
        struct eal_dev_mp_req tmp_req;
        struct rte_devargs *da;
-       struct rte_device *dev;
+       struct rte_device *dev = NULL;
        struct rte_bus *bus;
        int ret = 0;
 
@@ -168,7 +168,15 @@ __handle_secondary_request(void *param)
        if (req->t == EAL_DEV_REQ_TYPE_ATTACH) {
                tmp_req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
                eal_dev_hotplug_request_to_secondary(&tmp_req);
-               local_dev_remove(dev);
+               if (dev == NULL) {
+                       /* device was already attached at the time we got the
+                        * request, don't detach it now.
+                        */
+                       RTE_LOG(WARNING, EAL,
+                               "Devices in secondary may not sync with 
primary\n");
+               } else {
+                       local_dev_remove(dev);
+               }
        } else {
                tmp_req.t = EAL_DEV_REQ_TYPE_DETACH_ROLLBACK;
                eal_dev_hotplug_request_to_secondary(&tmp_req);
-- 
2.17.1

Reply via email to