Hello Matan, I think the commit log could be shorter. Proposing this, feel free to expand it if you prefer.
---8<--- When removing a device, the fail-safe checks that it is not within its datapath before cleaning it. When checking whether an Rx burst should be performed on a device, the remove flag is not checked. Thus the port could still enter its datapath and miss a removal round. Furthermore, there is a race between the thread removing the device and the polling thread. Check the remove flag before entering a sub-device Rx burst when in safe mode. This check mitigates the aforementioned race condition. --->8--- Otherwise, On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote: > In case of plug out, the RMV interrupt callback sets the remove flag of > the removed sub-device. The next hotplug alarm cycle should read this > flag and if the data path are clean it should remove the sub-device. > > In case of fail-safe RX burst calling from application, fail-afe tries > to call to all STARTED sub-device rx_burst functions. The remove flag > is not checked here and fail-safe may call to the removed sub-device > rx_burst function. > > The above 2 cases run in different threads and there is a race between > the removed sub-device RX clean check to the removed sub-device > rx_burst call makes the sub device RX unclean. > > If the application calls to rx_burst in loop, the probability to get RX > clean is not enough, especially when there are few sub-devices or if the > rx_burst function of the removed sub-device takes a lot of time. > > Each time the sub-device data path is unclean, the second oportunity to > check it again should be only in the hotplug alarm next cycle; the > default time between cycles is 2 seconds. > > In this loop when fail-safe tries to remove the sub-device, the > sub-device may appear back and fail-safe cannot plug it in back until > the removal process is completted. In this time fail-safe may lose the > primary sub-device services and may hurt application performance. > > This patch adds a remove flag check in safe rx_burst function. > By this way, at most one more hotplug alarm cycle is necessary > to get the sub-device clean for actual removal. > > Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions") > Cc: sta...@dpdk.org > > Signed-off-by: Matan Azrad <ma...@mellanox.com> Acked-by: Gaetan Rivet <gaetan.ri...@6wind.com> > --- > drivers/net/failsafe/failsafe_rxtx.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/failsafe/failsafe_rxtx.c > b/drivers/net/failsafe/failsafe_rxtx.c > index 7311421..70157c8 100644 > --- a/drivers/net/failsafe/failsafe_rxtx.c > +++ b/drivers/net/failsafe/failsafe_rxtx.c > @@ -43,7 +43,8 @@ > { > return (ETH(sdev) == NULL) || > (ETH(sdev)->rx_pkt_burst == NULL) || > - (sdev->state != DEV_STARTED); > + (sdev->state != DEV_STARTED) || > + (sdev->remove != 0); > } > > static inline int > -- > 1.8.3.1 > -- Gaëtan Rivet 6WIND