On Thu, Jan 18, 2018 at 11:22:51PM +0100, Thomas Monjalon wrote: > 29/11/2017 20:17, Ferruh Yigit: > > >>> On Thu, Oct 05, 2017 at 10:42:08PM +0000, Ophir Munk wrote: > > >>>> This commit prevents control path operations from failing after a sub > > >>>> device removal. > > >>>> > > >>>> Following are the failure steps: > > >>>> 1. The physical device is removed due to change in one of PF > > >>>> parameters (e.g. MTU) 2. The interrupt thread flags the device 3. > > >>>> Within 2 seconds Interrupt thread initializes the actual device > > >>>> removal, then every 2 seconds it tries to re-sync (plug in) the > > >>>> device. The trials fail as long as VF parameter mismatches the PF > > >>> parameter. > > >>>> 4. A control thread initiates a control operation on failsafe which > > >>>> initiates this operation on the device. > > >>>> 5. A race condition occurs between the control thread and interrupt > > >>>> thread when accessing the device data structures. > > >>>> > > >>>> This commit prevents the race condition in step 5. Before this commit > > >>>> if a device was removed and then a control thread operation was > > >>>> initiated on failsafe - in some cases failsafe called the sub device > > >>>> operation instead of avoiding it. Such cases could lead to operations > > >>> failures. > [...] > > > > Reminder of this patch remaining from previous release. > > Gaetan, what is the decision for this possible race condition?
This patchset had several issues that I outlined. > Can we try to fix it in 18.02? These patches could go in with a rework. If you feel like it I can review those fixes in the coming weeks if new versions are submitted. -- Gaëtan Rivet 6WIND