On 10/2/2018 11:53 PM, Burakov, Anatoly wrote:
On 02-Oct-18 1:35 PM, Jeff Guo wrote:
The mechanism can initially register the sigbus handler after the device
event monitor is enabled. When a sigbus event is captured, it will check
the failure address and accordingly handle the memory failure of the
corresponding device by invoke the hot-unplug handler. It could prevent
the application from crashing when a device is hot-unplugged.

By this patch, users could call below new added APIs to enable/disable
the device hotplug handle mechanism. Note that it just implement the
hot-unplug handler in these functions, the other handler of hotplug, such
as handler for hotplug binding, could be add in the future if need:
   - rte_dev_hotplug_handle_enable
   - rte_dev_hotplug_handle_disable

Signed-off-by: Jeff Guo <jia....@intel.com>
---

<snip>

+static void sigbus_handler(int signum, siginfo_t *info,
+                void *ctx __rte_unused)
+{
+    int ret;
+
+    RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+        (int)pthread_self(), info->si_addr);
+
+    rte_spinlock_lock(&failure_handle_lock);
+    ret = rte_bus_sigbus_handler(info->si_addr);
+    rte_spinlock_unlock(&failure_handle_lock);
+    if (ret == -1) {
+        rte_exit(EXIT_FAILURE,
+             "Failed to handle SIGBUS for hot-unplug, "
+             "(rte_errno: %s)!", strerror(rte_errno));

Do we really want to exit the application on sigbus handle failure?


Definitely yes we want, since it is a failure of the process. Agree with Konstantin reply on other mail.


+    } else if (ret == 1) {
+        if (sigbus_action_old.sa_handler)
+            (*(sigbus_action_old.sa_handler))(signum);
+        else
+            rte_exit(EXIT_FAILURE,
+                 "Failed to handle generic SIGBUS!");
+    }
+
+    RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n");

Again, does this all need to be with INFO log level? IMO it should be DEBUG.


I am fine for that.


+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+    const void *_name)
+{
+    const char *name = _name;
+
+    return strcmp(dev->name, name);
+}
+
  static int

<snip>

    int __rte_experimental
@@ -220,5 +320,67 @@ rte_dev_event_monitor_stop(void)
      close(intr_handle.fd);
      intr_handle.fd = -1;
      monitor_started = false;
+
      return 0;

This looks like unintended change.


No, i intended to change it to consistent with the other format.


  }
+
+int __rte_experimental
+rte_dev_sigbus_handler_register(void)
+{
+    sigset_t mask;
+    struct sigaction action;
+

<snip>

--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -281,6 +281,8 @@ EXPERIMENTAL {
      rte_dev_event_callback_unregister;
      rte_dev_event_monitor_start;
      rte_dev_event_monitor_stop;
+    rte_dev_hotplug_handle_enable;
+    rte_dev_hotplug_handle_disable;

Nitpicking - disable should be above enable, as E follows D in alphabet :)


yes, after recheck with alphabet, it definitely like what you said. :).


      rte_dev_iterator_init;
      rte_dev_iterator_next;
      rte_devargs_add;



Reply via email to