Hi, On 02/10/20 17:01 -0700, Long Li wrote: > From: Long Li <lon...@microsoft.com> > > When adding a sub-device, it's possible that the sub-device is configured > successfully but later fails to start. This error should not be masked.
Some of those errors are meant to be masked: -EIO, when the device is marked as removed at the ethdev level (see eth_err() in rte_ethdev.c:819). > The driver needs to check the error status to prevent endless loop of > trying to start the sub-device. If the ethdev layer error is due to the device being removed, and failsafe loops on trying to sync the eth device to its own state, then an RMV event should have been emitted but wasn't or it was missed by failsafe. If the ethdev layer error is *not* due to the device being removed, the error should be != -EIO, and sdev->remove should not be set, so fs_err() should not mask it and it should be seen by the app. Can you provide the following details: * What is the return code of rte_eth_dev_start() that is masked in your start loop? * Is the device marked as removed in failsafe? * Is the device marked as removed in ethdev? * Was there an RMV event generated for the device? Whether yes or no, is it correct? Thanks, > > fixes (ae80146c5a1b net/failsafe: fix removed device handling) > > cc: sta...@dpdk.org > Signed-off-by: Long Li <lon...@microsoft.com> > --- > drivers/net/failsafe/failsafe_private.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/failsafe/failsafe_private.h > b/drivers/net/failsafe/failsafe_private.h > index 651578a..c58c0de 100644 > --- a/drivers/net/failsafe/failsafe_private.h > +++ b/drivers/net/failsafe/failsafe_private.h > @@ -497,7 +497,7 @@ int failsafe_eth_new_event_callback(uint16_t port_id, > fs_err(struct sub_device *sdev, int err) > { > /* A device removal shouldn't be reported as an error. */ > - if (sdev->remove == 1 || err == -EIO) > + if (sdev->remove == 1 && err == -EIO) > return rte_errno = 0; > return err; > } > -- > 1.8.3.1 > -- Gaëtan