Re: [PATCH 2/2] ethdev: fix race condition in fast-path ops setup

Konstantin Ananyev Sun, 26 Feb 2023 09:22:27 -0800

If ethdev enqueue or dequeue function is called during
eth_dev_fp_ops_setup(), it may get pre-empted after setting the
function pointers, but before setting the pointer to port data.
In this case the newly registered enqueue/dequeue function will
use dummy port data and end up in seg fault.

This patch moves the updation of each data pointers before
updating corresponding function pointers.

Fixes: c87d435a4d79 ("ethdev: copy fast-path API into separate
structure")
Cc: [email protected]


Why is something calling enqueue/dequeue when device is not fully

started.

A correctly written application would not call rx/tx burst until
after ethdev start had finished.


Please refer the eb0d471a894 (ethdev: add proactive error handling
mode), when driver recover itself, the application may still invoke

enqueue/dequeue API.

Right now DPDK ethdev layer *does not* provide synchronization
mechanisms between data-path and control-path functions.
That was a deliberate deisgn choice. If we want to change that rule, then I
suppose we need a community consensus for it.
I think that if the driver wants to provide some sort of error recovery
procedure, then it has to provide some synchronization mechanism inside it
between data-path and control-path functions.
Actually looking at eb0d471a894 (ethdev: add proactive error handling
mode), and following patches I wonder how it creeped in?
It seems we just introduced a loophole for race condition with this
approach...


Could you try to describe the specific scenario of loophole ?


Ok, as I understand the existing mechanism:

When PMD wants to start a recovery it has to:
  - invoke  rte_eth_dev_callback_process(RTE_ETH_EVENT_ERR_RECOVERING);
    That supposed to call user provided callback. After callback is finished 
PMD assumes
    that user is aware that recovery is about to start and should make some 
precautions.
- when recovery is finished it invokes another callback:
   RTE_ETH_EVENT_RECOVERY_(SUCCESS/FAILED). After that user either can continue 
to
   use port or have to treat is as faulty.

The idea is ok in principle, but there is a problem.

lib/ethdev/rte_ethdev.h:

/** Port recovering from a hardware or firmware error.

          * If PMD supports proactive error recovery,
          * it should trigger this event to notify application
          * that it detected an error and the recovery is being started.

<<< !!!!!
          * Upon receiving the event, the application should not invoke any 
control path API
          * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
          * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED 
event.
          * The PMD will set the data path pointers to dummy functions,
          * and re-set the data path pointers to non-dummy functions
          * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
<<< !!!!!

That part is just wrong I believe.
It should be:
Upon receiving the event, the application should not invoke any *both control 
and data-path* API
until receiving  RTE_ETH_EVENT_RECOVERY_SUCCESS or 
RTE_ETH_EVENT_RECOVERY_FAILED event.
Resetting data path pointers to dummy functions by PMD *before* invoking
rte_eth_dev_callback_process(RTE_ETH_EVENT_ERR_RECOVERING);
introduces a race-condition with data-path threads, as such thread could 
already be inside RX/TX function
or can already read RX/TX function/data pointers and be about to use them.


Current practices: the PMDs already add some delay after set Rx/Tx callback to 
dummy, and plus the DPDK
worker thread is busypolling, the probability of occurence in reality is zero. 
But in theoretically exist
the above race-condition.



Adding delay might make a problem a bit less reproducible,
but it doesn't fix it.
The bug is still there.

And right now rte_ethdev layer doesn't provide any mechanism to check it or 
wait when they'll finish, etc.

Yes


So, probably the simplest way to fix it with existing DPDK design:
- user level callback  RTE_ETH_EVENT_ERR_RECOVERING should return only after it 
ensures that *all*
   application threads (and processes) stopped using either control or 
data-path functions for that port


Agree

   (yes it means that application that wants to use this feature has to provide 
its own synchronization mechanism
   around data-path functions (RX/TX) that it is going to use).
- after that PMD is safe to reset rte_eth_fp_ops[] values to dummy ones.

And message to all PMD developers:
*please stop updating rte_eth_fp_ops[] on your own*.
That's a bad practice and it is not supposed to do things that way.
There is a special API provided for these purposes:
eth_dev_fp_ops_reset(), eth_dev_fp_ops_setup(), so use it.


This two function is in private.h, so it should be expose to public header file.


You mean we need to move these functions declarations into ethdev_driver.h?
If so, then yes, I think we probably do.


BTW,  I don't see any implementation for RTE_ETH_EVENT_ERR_RECOVERING within
either testpmd or any other example apps.
Am I missing something?


Currently it just promote the event.



Ok, can I suggest then to add a proper usage for into in testpmd?
It looks really strange that we add new feature into ethdev (and 2 PMDs),
but didn't provide any way for users to test it.

If not, then probably it could be a good starting point - let's incorporate it 
inside testpmd
(new forwarding engine probably) so everyone can test/try it.

          * It means that the application cannot send or receive any packets
          * during this period.
          * @note Before the PMD reports the recovery result,
          * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event again,
          * because a larger error may occur during the recovery.
          */
         RTE_ETH_EVENT_ERR_RECOVERING,

It probably needs to be either deprecated or reworked.

Looking at the commit, it does not say anything about the data plane functions 
which probably means, the error recovery is

happening within the data plane thread. What happens to other data plane 
threads that are polling the same port on which the error
recovery is happening?

The commit log says: "the PMD sets the data path pointers to dummy functions".

So the data plane threads will receive non-packet and send zero with port which 
in error recovery.


Also, the commit log says that while the error recovery is under progress, the 
application should not call any control plane APIs. Does

that mean, the application has to check for error condition every time it calls 
a control plane API?

If application has not register event (RTE_ETH_EVENT_ERR_RECOVERING) callback, 
it could calls control plane API, but it will return
failed.
If application has register above callback, it can wait for recovery result, or 
direct call without wait but this will return failed.


The commit message also says that "PMD makes sure the control path operations failed 
with retcode -EBUSY". It does not say how it

does this. But, any communication from the PMD thread to control plane thread 
may introduce race conditions if not done correctly.

First there are no PMD thread, do you mean eal-intr-thread ?

As for this question, you can see PMDs which already implement it, they both 
provides mutual exclusion protection.


Would something like this work better?

Note: there is another bug in current code. The check for link state
interrupt and link_ops could return -ENOTSUP and leave device in

indeterminate state.

The check should be done before calling PMD.

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
0266cc82acb6..d6c163ed85e7 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1582,6 +1582,14 @@ rte_eth_dev_start(uint16_t port_id)
                return 0;
        }

+       if (dev->data->dev_conf.intr_conf.lsc == 0 &&
+           dev->dev_ops->link_update == NULL) {
+               RTE_ETHDEV_LOG(INFO,
+                              "Device with port_id=%"PRIu16" link update not

supported\n",

+                              port_id);
+                       return -ENOTSUP;
+       }
+
        ret = rte_eth_dev_info_get(port_id, &dev_info);
        if (ret != 0)
                return ret;
@@ -1591,9 +1599,7 @@ rte_eth_dev_start(uint16_t port_id)
                eth_dev_mac_restore(dev, &dev_info);

        diag = (*dev->dev_ops->dev_start)(dev);
-       if (diag == 0)
-               dev->data->dev_started = 1;
-       else
+       if (diag != 0)
                return eth_err(port_id, diag);

        ret = eth_dev_config_restore(dev, &dev_info, port_id); @@ -1611,16
+1617,18 @@ rte_eth_dev_start(uint16_t port_id)
                return ret;
        }

-       if (dev->data->dev_conf.intr_conf.lsc == 0) {
-               if (*dev->dev_ops->link_update == NULL)
-                       return -ENOTSUP;
-               (*dev->dev_ops->link_update)(dev, 0);
-       }
-
        /* expose selection of PMD fast-path functions */
        eth_dev_fp_ops_setup(rte_eth_fp_ops + port_id, dev);

+       /* ensure state is set before marking device ready */
+       rte_smp_wmb();
+
        rte_ethdev_trace_start(port_id);
+
+       /* Update current link state */
+       if (dev->data->dev_conf.intr_conf.lsc == 0)
+               (*dev->dev_ops->link_update)(dev, 0);
+
        return 0;
  }


.

Re: [PATCH 2/2] ethdev: fix race condition in fast-path ops setup

Reply via email to