On 4/6/2018 10:24 PM, Zhang, Qi Z wrote:
One more comment

-----Original Message-----
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Zhang, Qi Z
Sent: Friday, April 6, 2018 10:04 PM
To: Guo, Jia <jia....@intel.com>; step...@networkplumber.org; Richardson,
Bruce <bruce.richard...@intel.com>; Yigit, Ferruh <ferruh.yi...@intel.com>;
Ananyev, Konstantin <konstantin.anan...@intel.com>;
gaetan.ri...@6wind.com; Wu, Jingjing <jingjing...@intel.com>;
tho...@monjalon.net; mo...@mellanox.com; Van Haaren, Harry
<harry.van.haa...@intel.com>; Tan, Jianfeng <jianfeng....@intel.com>
Cc: jblu...@infradead.org; shreyansh.j...@nxp.com; dev@dpdk.org; Guo, Jia
<jia....@intel.com>; Zhang, Helin <helin.zh...@intel.com>
Subject: Re: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism
for hot plug

Hi Jeff:

-----Original Message-----
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jeff Guo
Sent: Friday, April 6, 2018 6:57 PM
To: step...@networkplumber.org; Richardson, Bruce
<bruce.richard...@intel.com>; Yigit, Ferruh <ferruh.yi...@intel.com>;
Ananyev, Konstantin <konstantin.anan...@intel.com>;
gaetan.ri...@6wind.com; Wu, Jingjing <jingjing...@intel.com>;
tho...@monjalon.net; mo...@mellanox.com; Van Haaren, Harry
<harry.van.haa...@intel.com>; Tan, Jianfeng <jianfeng....@intel.com>
Cc: jblu...@infradead.org; shreyansh.j...@nxp.com; dev@dpdk.org; Guo,
Jia <jia....@intel.com>; Zhang, Helin <helin.zh...@intel.com>
Subject: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism
for hot plug

This patch introduces an API (rte_dev_handle_hot_unplug) to handle
device hot unplug event. When device be hot plug out, the device
resource become invalid, if this resource is still be unexpected
read/write, system will crash. The api let user register the hot
unplug handler,
The description is a little bit misleading , based on current implementation.
it does not a function that let user "register a handler", actually It let user 
to
"set a recover point" when some exception (like sigbus) happen due to hot 
unplug.
Maybe the function name could be changed also,
for example: rte_dev_set_recover_point.
make sense , will check the better way demonstrate that.
when hot plug failure occur, the working thread will
be block until the uevent mechanism successful recovery the memory and
guaranty the application keep running smoothly.
Signed-off-by: Jeff Guo <jia....@intel.com>
---
v19->18:
add note for limitation of multiple hotplug
---
  doc/guides/rel_notes/release_18_05.rst  |   6 ++
  kernel/linux/igb_uio/igb_uio.c          |   4 +
  lib/librte_eal/common/include/rte_dev.h |  19 +++++
  lib/librte_eal/linuxapp/eal/eal_dev.c   | 140
+++++++++++++++++++++++++++++++-
  lib/librte_eal/rte_eal_version.map      |   1 +
  5 files changed, 169 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_05.rst
b/doc/guides/rel_notes/release_18_05.rst
index cb9e050..2707e73 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -70,6 +70,12 @@ New Features

    Linux uevent is supported as backend of this device event
notification framework.

+* **Added hot plug failure handler.**
+
+  Added a failure handler machenism to handle hot unplug device.
+
+  * ``rte_dev_handle_hot_unplug`` for handle hot unplug device failure.
+
  API Changes
  -----------

diff --git a/kernel/linux/igb_uio/igb_uio.c
b/kernel/linux/igb_uio/igb_uio.c index 4cae4dd..293c310 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct
inode
*inode)
        struct rte_uio_pci_dev *udev = info->priv;
        struct pci_dev *dev = udev->pdev;

+       /* check if device has been remove before release */
+       if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
+               return -1;
+
        mutex_lock(&udev->lock);
        if (--udev->refcnt > 0) {
                mutex_unlock(&udev->lock);
diff --git a/lib/librte_eal/common/include/rte_dev.h
b/lib/librte_eal/common/include/rte_dev.h
index a5203e7..17c446d 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -361,4 +361,23 @@ rte_dev_event_monitor_start(void);
   */
  int __rte_experimental
  rte_dev_event_monitor_stop(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It can be used to register the device signal bus handler, and save
+the
+ * current environment for each thread, when signal bus error invoke,
+the
+ * handler would restore the environment by long jmp to each working
+ * thread previous locate, then block the thread to waiting until the
+memory
+ * recovery and remapping be finished, that would guaranty the system
+not
+ * crash when the device be hot unplug.
+ *
+ * @param none
+ * @return
+ *   - From a successful direct invocation, zero.
+ *   - From a call of siglongjmp(), non_zero.
+ */
+int __rte_experimental
+rte_dev_handle_hot_unplug(void);
  #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 9478a39..84b7efc 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,9 @@

  #include <string.h>
  #include <unistd.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <pthread.h>
  #include <sys/socket.h>
  #include <linux/netlink.h>

@@ -13,12 +16,17 @@
  #include <rte_malloc.h>
  #include <rte_interrupts.h>
  #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_per_lcore.h>

  #include "eal_private.h"

  static struct rte_intr_handle intr_handle = {.fd = -1 };  static bool
monitor_started;

+pthread_mutex_t failure_recovery_lock; pthread_cond_t
+failure_recovery_cond;
+
  #define EAL_UEV_MSG_LEN 4096
  #define EAL_UEV_MSG_ELEM_LEN 128

@@ -32,6 +40,22 @@ enum eal_dev_event_subsystem {
        EAL_DEV_EVENT_SUBSYSTEM_MAX
  };

+static RTE_DEFINE_PER_LCORE(sigjmp_buf, unplug_longjmp_env);
+
+static void sigbus_handler(int signum __rte_unused) {
+       RTE_LOG(DEBUG, EAL, "receive SIGBUS error!\n");
+       siglongjmp(RTE_PER_LCORE(unplug_longjmp_env), 1); }
+
+static int cmp_dev_name(const struct rte_device *dev,
+       const void *_name)
+{
+       const char *name = _name;
+
+       return strcmp(dev->name, name);
+}
+
  static int
  dev_uev_socket_fd_create(void)
  {
@@ -132,6 +156,31 @@ dev_uev_parse(const char *buf, struct
rte_dev_event *event, int length)
        return 0;
  }

+static int
+dev_uev_remove_handler(struct rte_device *dev) {
+       struct rte_bus *bus = rte_bus_find_by_device_name(dev->name);
+       int ret;
+
+       if (!dev)
+               return -1;
+
+       if (bus->handle_hot_unplug) {
+               /**
+                * call bus ops to handle hot unplug.
+                */
+               ret = bus->handle_hot_unplug(dev);
+               if (ret) {
+                       RTE_LOG(ERR, EAL,
+                               "It cannot handle hot unplug for device (%s) "
+                               "on the bus.\n ",
+                               dev->name);
+                       return ret;
+               }
+       }
+       return 0;
+}
+
  static void
  dev_delayed_unregister(void *param)
  {
@@ -146,6 +195,9 @@ dev_uev_handler(__rte_unused void *param)
        struct rte_dev_event uevent;
        int ret;
        char buf[EAL_UEV_MSG_LEN];
+       struct rte_bus *bus;
+       struct rte_device *dev;
+       const char *busname;

        memset(&uevent, 0, sizeof(struct rte_dev_event));
        memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -170,11 +222,87 @@ dev_uev_handler(__rte_unused void *param)
        RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d,
subsystem:%d)\n",
                uevent.devname, uevent.type, uevent.subsystem);

-       if (uevent.devname)
+       switch (uevent.subsystem) {
+       case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+       case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+               busname = "pci";
+               break;
+       default:
+               break;
+       }
+
+       if (uevent.devname) {
+               if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+                       bus = rte_bus_find_by_name(busname);
+                       if (bus == NULL) {
+                               RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+                                       uevent.devname);
+                               return;
+                       }
+                       dev = bus->find_device(NULL, cmp_dev_name,
+                                              uevent.devname);
+                       if (dev == NULL) {
+                               RTE_LOG(ERR, EAL,
+                                       "Cannot find unplugged device (%s)\n",
+                                       uevent.devname);
+                               return;
+                       }
+                       ret = dev_uev_remove_handler(dev);
+                       if (ret) {
+                               RTE_LOG(ERR, EAL, "Driver cannot remap the "
+                                       "device (%s)\n",
+                                       dev->name);
+                               return;
+                       }
+                       /* wake up all the threads */
+                       pthread_cond_broadcast(&failure_recovery_cond);
+               }
                dev_callback_process(uevent.devname, uevent.type);
+       }
  }

  int __rte_experimental
+rte_dev_handle_hot_unplug(void)
+{
+       struct sigaction act;
+       sigset_t mask;
+       int ret = 0;
+
+       /* set signal handlers */
+       memset(&act, 0x00, sizeof(struct sigaction));
+       act.sa_handler = sigbus_handler;
+       sigemptyset(&act.sa_mask);
+       act.sa_flags = SA_RESTART;
+       sigaction(SIGBUS, &act, NULL);
+       sigemptyset(&mask);
+       sigaddset(&mask, SIGBUS);
+       pthread_sigmask(SIG_UNBLOCK, &mask, NULL);
+
+       ret = sigsetjmp(RTE_PER_LCORE(unplug_longjmp_env), 1);
+       if (ret) {
+               /*
+                * Waitting for condition variable before failure recovery
+                * finish. Now the limitation is only handle one device
+                * hot plug, for multiple devices hotplug, need check if
+                * the device belong to this working thread, then directly
+                * call memory remaping, unrelated thread just keep going
+                * their work by no interrupt from hotplug.
+                * TODO: multiple device hotplug
+                */
+               pthread_mutex_lock(&failure_recovery_lock);
+               RTE_LOG(DEBUG, EAL, "begin waiting for the failure handler.\n");
+               pthread_cond_wait(&failure_recovery_cond,
+                                       &failure_recovery_lock);
+               RTE_LOG(DEBUG, EAL,
+                      "come back from waiting for failure handler.\n");
+               pthread_mutex_unlock(&failure_recovery_lock);
I think we should not assume phread_cond_wait always happen before
pthread_cond_broadcast, It is possible Sigbus just happen before remap in
udev remove handler while pthread_cond_wait happens after
pthread_cond_broadcast, then we will wait forever.

I think we need a flag to sync
For example:

pthread_mutex_lock(&failure_recovery_lock);
if ( udev_remove_handle == 0 )
        pthread_cond_wait(&failure_recovery_cond, & failure_recovery_lock);
pthread_remove_handle = 0; pthread_mutex_unlock(&failure_recovery_lock);

while at remove handler:
pthread_mutex_lock(&failure_recovery_lock);
pthread_remove_handle = 1;
pthread_cond_signel(&failure_recovery_cond);
pthread_mutex_unlock(&failure_recovery_lock);


Regards
Qi

+       }
+
+       return ret;
+}
+
+
+int __rte_experimental
  rte_dev_event_monitor_start(void)
  {
        int ret;
@@ -196,6 +324,12 @@ rte_dev_event_monitor_start(void)
                return -1;
        }

+       /* initialize mutex and condition variable
+        * to control failure recovery.
+        */
+       pthread_mutex_init(&failure_recovery_lock, NULL);
+       pthread_cond_init(&failure_recovery_cond, NULL);
+
        monitor_started = true;

        return 0;
@@ -219,5 +353,9 @@ rte_dev_event_monitor_stop(void)
        close(intr_handle.fd);
        intr_handle.fd = -1;
        monitor_started = false;
+
+       pthread_cond_destroy(&failure_recovery_cond);
+       pthread_mutex_destroy(&failure_recovery_lock);
+
        return 0;
  }
diff --git a/lib/librte_eal/rte_eal_version.map
b/lib/librte_eal/rte_eal_version.map
index fc5c62a..873ef38 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -262,5 +262,6 @@ EXPERIMENTAL {
        rte_dev_event_monitor_stop;
        rte_dev_event_callback_register;
        rte_dev_event_callback_unregister;
+       rte_dev_handle_hot_unplug;

  } DPDK_18.02;
--
2.7.4

Reply via email to