On 10/26/2017 9:20 AM, Gaëtan Rivet wrote: > Hello Matan, > > I think the commit log could be shorter. > Proposing this, feel free to expand it if you prefer. > > ---8<--- > > When removing a device, the fail-safe checks that it is not within its > datapath before cleaning it. > > When checking whether an Rx burst should be performed on a device, the > remove flag is not checked. Thus the port could still enter its datapath > and miss a removal round. Furthermore, there is a race between the > thread removing the device and the polling thread. > > Check the remove flag before entering a sub-device Rx burst when in safe > mode. This check mitigates the aforementioned race condition. > > --->8--- > > Otherwise, > > On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote: >> In case of plug out, the RMV interrupt callback sets the remove flag of >> the removed sub-device. The next hotplug alarm cycle should read this >> flag and if the data path are clean it should remove the sub-device. >> >> In case of fail-safe RX burst calling from application, fail-afe tries >> to call to all STARTED sub-device rx_burst functions. The remove flag >> is not checked here and fail-safe may call to the removed sub-device >> rx_burst function. >> >> The above 2 cases run in different threads and there is a race between >> the removed sub-device RX clean check to the removed sub-device >> rx_burst call makes the sub device RX unclean. >> >> If the application calls to rx_burst in loop, the probability to get RX >> clean is not enough, especially when there are few sub-devices or if the >> rx_burst function of the removed sub-device takes a lot of time. >> >> Each time the sub-device data path is unclean, the second oportunity to >> check it again should be only in the hotplug alarm next cycle; the >> default time between cycles is 2 seconds. >> >> In this loop when fail-safe tries to remove the sub-device, the >> sub-device may appear back and fail-safe cannot plug it in back until >> the removal process is completted. In this time fail-safe may lose the >> primary sub-device services and may hurt application performance. >> >> This patch adds a remove flag check in safe rx_burst function. >> By this way, at most one more hotplug alarm cycle is necessary >> to get the sub-device clean for actual removal. >> >> Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Matan Azrad <ma...@mellanox.com> > > Acked-by: Gaetan Rivet <gaetan.ri...@6wind.com>
Applied to dpdk-next-net/master, thanks. (used suggested commit log, thanks.)