On Thu, Dec 14, 2017 at 10:40:22AM +0000, Matan Azrad wrote: > Hi Gaetan >
<snip> > > > > > > If you add this check in the iterator itself, you would skip removed > > > devices before attempting operating upon them, right? > > > > > > Then it should probably help with your issue, unless you tested it and > > > verified that it didnt? > > > > > > Something like this: > > > > > > ---8<--- > > > > > > diff --git a/drivers/net/failsafe/failsafe_private.h > > > b/drivers/net/failsafe/failsafe_private.h > > > index d81cc3ca6..62ddc0689 100644 > > > --- a/drivers/net/failsafe/failsafe_private.h > > > +++ b/drivers/net/failsafe/failsafe_private.h > > > @@ -316,8 +316,12 @@ fs_find_next(struct rte_eth_dev *dev, > > > subs = PRIV(dev)->subs; > > > tail = PRIV(dev)->subs_tail; > > > while (sid < tail) { > > > + if (min_state > DEV_PROBED && > > > + fs_is_removed(&sub[sid])) > > > + goto next; > > > if (subs[sid].state >= min_state) > > > break; > > > +next: > > > sid++; > > > } > > > *sid_out = sid; > > > > > > --->8--- > > > > > > Only issue being that it is completely racy, but as this MT-unsafe > > > property is inescapable we might as well ignore it and go for KISS. > > > > > > If that's enough, I would prefer instead of having this additional > > > check added to all rte_eth operations. > > > > > > > Ok, actually you were right here to do it this way. The "is_removed" > > check needs to happen after the operation attempt to effectively mitigate > > the possible race. Checking before attempting the call will be much less > > effective. > > > > That being said, would it be cleaner to have eth_dev ops return -ENODEV > > directly, and check against it within fail-safe? > > > > I think that according to "is_removed" semantic we must return a Boolean > value (Each value different from '0' means that the device is removed) like > other functions in c library (for example isspace()). > Sure, I wasn't discussing the interface proposed by rte_eth_dev_is_removed(). What I meant was to ask whether checking rte_eth_dev_is_removed() would be more interesting in the ethdev layer, making the eth_dev_ops return -ENODEV regardless of the previous error if this check is supported by the driver and signal that the port is removed. I think this information could be interesting to other systems, not just fail-safe. -- Gaëtan Rivet 6WIND