On Thu, Oct 21, 2021, at 23:42, vipul.as...@oracle.com wrote: > From: Vipul Ashri <vipul.as...@oracle.com> > > failsafe crashed while sending early link_update request during > boot time initialization. > Based on debugging we found failsafe device was good but sub- > devices were progressing towards initialization and SUBOPS macro > where expanding macro gives [partial_dev]->dev_ops->link_update() > execution of which triggered crash because dev_ops==0. similar > crash seen at failsafe_eth_dev_close() > > Failsafe driver need a separate check for subdevices similar to > "RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);" which is > called to almost every eth_dev function. > > Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD") > Cc: sta...@dpdk.org > Signed-off-by: Vipul Ashri <vipul.as...@oracle.com>
Hello Vipul, I'm sorry for the delay, I missed your fix on the mailing list. IIUC, the issue is that failsafe finished init and received an ethdev operation call, but one of its sub-device, although marked DEV_ACTIVE, has its eth_dev->dev_ops field NULL. It is really surprising to me, because there aren't many ways for a sub-device to become DEV_ACTIVE. The only two ways are * by executing 'fs_dev_configure()', which will first execute rte_eth_dev_configure() on the sub-device, and on error would stop *without* setting DEV_ACTIVE. rte_eth_dev_configure() will itself execute RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV), so it would return negative errno and fs_dev_configure() would abort. * by executing 'fs_dev_remove()' and the sub-device was 'DEV_STARTED' to begin with, then it is retrograded to DEV_ACTIVE once stopped. So I don't understand yet how it is possible for a sub-device to become DEV_ACTIVE while its eth_dev->dev_ops are NULL. It seems more like a bug, memory corruption or just an unexpected execution pattern. Could describe in more detail the execution? In particular, setting the EAL log-level to debug with the option: ' --log-level pmd.net.failsafe:debug ' for example while using testpmd or your DPDK app. It should show ethdev level accesses to the sub-devices, and error values. Best regards, -- Gaetan Rivet