> > >>>>>>>>> In the proactive error handling mode, the PMD will set the data > > >>>>>>>>> path > > >>>>>>>>> pointers to dummy functions and then try recovery, in this period > > >>>>>>>>> the > > >>>>>>>>> application may still invoking data path API. This will introduce > > >>>>>>>>> a > > >>>>>>>>> race-condition with data path which may lead to crash [1]. > > >>>>>>>>> > > >>>>>>>>> Although the PMD added delay after setting data path pointers to > > >>>>>>>>> cover > > >>>>>>>>> the above race-condition, it reduces the probability, but it > > >>>>>>>>> doesn't > > >>>>>>>>> solve the problem. > > >>>>>>>>> > > >>>>>>>>> To solve the race-condition problem fundamentally, the following > > >>>>>>>>> requirements are added: > > >>>>>>>>> 1. The PMD should set the data path pointers to dummy functions > > >>>>>>>>> after > > >>>>>>>>> report RTE_ETH_EVENT_ERR_RECOVERING event. > > >>>>>>>>> 2. The application should stop data path API invocation when > > >>>>>>>>> process > > >>>>>>>>> the RTE_ETH_EVENT_ERR_RECOVERING event. > > >>>>>>>>> 3. The PMD should set the data path pointers to valid functions > > >>>>>>>>> before > > >>>>>>>>> report RTE_ETH_EVENT_RECOVERY_SUCCESS event. > > >>>>>>>>> 4. The application should enable data path API invocation when > > >>>>>>>>> process > > >>>>>>>>> the RTE_ETH_EVENT_RECOVERY_SUCCESS event. > > >>>>>>>>> > > >>>>>>> > > >>>>>>> How this is solving the race-condition, by pushing responsibility to > > >>>>>>> stop data path to application? > > >>>>>> > > >>>>>> Exactly, it becomes application responsibility to make sure > > >>>>>> data-path is > > >>>>>> stopped/suspended before recovery will continue. > > >>>>>> > > >>>>> > > >>>>> From documentation of the feature: > > >>>>> > > >>>>> `` > > >>>>> Because the PMD recovers automatically, > > >>>>> the application can only sense that the data flow is disconnected for > > >>>>> a > > >>>>> while and the control API returns an error in this period. > > >>>>> > > >>>>> In order to sense the error happening/recovering, as well as to > > >>>>> restore > > >>>>> some additional configuration, three events are available: > > >>>>> `` > > >>>>> > > >>>>> It looks like initial design is to use events mainly inform > > >>>>> application > > >>>>> about what happened and mainly for re-configuration. > > >>>>> > > >>>>> Although I am don't disagree to involve the application, I am not sure > > >>>>> that is part of current design. > > >>>> > > >>>> I thought we all agreed that initial design contain some fallacies that > > >>>> need to fixed, no? > > >>>> Statement that with current rte_ethdev design error recovery can be > > >>>> done > > >>>> without interaction with the app (to stop/suspend data/control path) > > >>>> is the main one I think. > > >>>> It needs some interaction with app layer, one way or another. > > >>>> > > >>>>>>> > > >>>>>>> What if application is not interested in recovery modes at all and > > >>>>>>> not > > >>>>>>> registered any callback for the recovery? > > >>>>>> > > >>>>>> > > >>>>>> Are you saying there is no way for application to disable > > >>>>>> automatic recovery in PMD if it is not interested > > >>>>>> (or can't full-fill per-requesties for it)? > > >>>>>> If so, then yes it is a problem and we need to fix it. > > >>>>>> I assumed that such mechanism to disable unwanted events already > > >>>>>> exists, > > >>>>>> but I can't find anything. > > >>>>>> Wonder what would be the easiest way here - can PMD make a decision > > >>>>>> based on callback return value, or do we need a new API to > > >>>>>> enable/disable callbacks, or ...? > > >>>>>> > > >>>>>> > > >>>>> > > >>>>> As far as I can see automatic recovery is not configurable by app. > > >>>>> > > >>>>> But that is not all, PMD sends events to application but PMD can't > > >>>>> know > > >>>>> if application is handling them or not, so with current design PMD > > >>>>> can't > > >>>>> rely on to app. > > >>>> > > >>>> Well, PMD invokes user provided callback. > > >>>> One way to fix that problem - if there is no callback provided, > > >>>> or callback returns an error code - PMD can assume that recovery > > >>>> should not be done. > > >>>> That is probably not the best design choice, but at least it will allow > > >>>> to fix the problem without too many changes and introducing new API. > > >>>> That could be sort of a 'quick fix'. > > >>>> In a meanwhile we can think about new/better approach for that. > > >>>> > > >>> > > >>> -rc2 for 23.03 is a few days away. > > >>> > > >>> What do you think to have 'quick fix' as modifying how driver updates > > >>> burst ops to prevent the race condition, for this release? > > > > The 'quick fix', do you mean only update function pointer (without rxq > > setting) ? > > Currently the PMDs which announced support "proactive error handling mode" > > already > > do this. > > Really sorry guys, I was too fast on the keyboard, and didn't read properly > what Ferruh suggested. > Reading it once again - no I don not agree with that. > It wouldn't fix anything, but will just add extra mess into the code. > Sorry again for the wrong reply. > Konstantin >
Thinking about 'quick fix' once again: I think the patches Fengchengwen already provided: https://patchwork.dpdk.org/project/dpdk/list/?series=27201 is a much better approach. I believe it should stop race condition (and crashing) with properly written callback. If we still have time for it, I'd suggest one extra change in PMD: check that recovery callback is installed, if not simply not start recovery at all. > > > > >>> > > >>> And plan a design update for the next release? > > >> +1 on the overall approach. > > > > > > Yep, agree. > > > > Hope for better solution. > > And also, I notice only the openvswitch (from all open-source software > > which based-on DPDK) > > registers RTE_ETH_EVENT_INTR_RESET callback . > > > > Therefore, hope we build a recovery framework at the DPDK SDK level and be > > compatible > > with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism. > > > > > > > >> > > >>> > > >>> > > >>>>> > > >>>>>>> I think driver should not rely on application for this, unless > > >>>>>>> application explicitly says (to driver) that it is handling > > >>>>>>> recovery, > > >>>>>>> right now there is no way for driver to know this. > > >>>>>> > > >>>>>> I think it is visa-versa: > > >>>>>> application should not enable auto-recovery if it can't meet > > >>>>>> per-requeststies for it (provide appropriate callback). > > >>>>>> > > >>>>> > > >>>>> I agree on above, we are saying similar thing in different > > >>>>> perspective. > > >>>> > > >>>> Ok, that's good we are on the same page. > > >>>> > > >>>> > > >>>>> > > >>>>>> > > >>>>>>> > > >>>>>>>>> Also, this patch introduce a driver internal function > > >>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD. > > >>>>>>>>> > > >>>>>>>>> [1] > > >>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kal...@intel.com/ > > >>>>>>>>> > > >>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode") > > >>>>>>>>> Cc: sta...@dpdk.org > > >>>>>>>>> > > >>>>>>>>> Signed-off-by: Chengwen Feng <fengcheng...@huawei.com> > > >>>>>>>>> --- > > >>>>>>>>> doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++--------- > > >>>>>>>>> lib/ethdev/ethdev_driver.c | 8 +++++++ > > >>>>>>>>> lib/ethdev/ethdev_driver.h | 10 ++++++++ > > >>>>>>>>> lib/ethdev/rte_ethdev.h | 32 > > >>>>>>>>> +++++++++++++++---------- > > >>>>>>>>> lib/ethdev/version.map | 1 + > > >>>>>>>>> 5 files changed, 46 insertions(+), 25 deletions(-) > > >>>>>>>>> > > >>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst > > >>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst > > >>>>>>>>> index c145a9066c..e380ff135a 100644 > > >>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst > > >>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst > > >>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes > > >>>>>>>>> recovery > > >>>>>>>>> in PASSIVE mode, > > >>>>>>>>> the PMD automatically recovers from error in PROACTIVE mode, > > >>>>>>>>> and only a small amount of work is required for the application. > > >>>>>>>>> > > >>>>>>>>> -During error detection and automatic recovery, > > >>>>>>>>> -the PMD sets the data path pointers to dummy functions > > >>>>>>>>> -(which will prevent the crash), > > >>>>>>>>> -and also make sure the control path operations fail with a return > > >>>>>>>>> code ``-EBUSY``. > > >>>>>>>>> - > > >>>>>>>>> -Because the PMD recovers automatically, > > >>>>>>>>> -the application can only sense that the data flow is disconnected > > >>>>>>>>> for a while > > >>>>>>>>> -and the control API returns an error in this period. > > >>>>>>>>> +During error detection and automatic recovery, the PMD sets the > > >>>>>>>>> data path > > >>>>>>>>> +pointers to dummy functions and also make sure the control path > > >>>>>>>>> operations > > >>>>>>>>> +failed with a return code ``-EBUSY``. > > >>>>>>>>> > > >>>>>>>>> In order to sense the error happening/recovering, > > >>>>>>>>> as well as to restore some additional configuration, > > >>>>>>>>> @@ -653,9 +648,9 @@ three events are available: > > >>>>>>>>> > > >>>>>>>>> ``RTE_ETH_EVENT_ERR_RECOVERING`` > > >>>>>>>>> Notify the application that an error is detected > > >>>>>>>>> - and the recovery is being started. > > >>>>>>>>> + and the recovery is about to start. > > >>>>>>>>> Upon receiving the event, the application should not invoke > > >>>>>>>>> - any control path function until receiving > > >>>>>>>>> + any control and data path API until receiving > > >>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or > > >>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event. > > >>>>>>>>> > > >>>>>>>>> .. note:: > > >>>>>>>>> @@ -666,8 +661,9 @@ three events are available: > > >>>>>>>>> > > >>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` > > >>>>>>>>> Notify the application that the recovery from error is > > >>>>>>>>> successful, > > >>>>>>>>> - the PMD already re-configures the port, > > >>>>>>>>> - and the effect is the same as a restart operation. > > >>>>>>>>> + the PMD already re-configures the port. > > >>>>>>>>> + The application should restore some additional configuration, > > >>>>>>>>> and then > > >>>>>>>>> + enable data path API invocation. > > >>>>>>>>> > > >>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` > > >>>>>>>>> Notify the application that the recovery from error failed, > > >>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c > > >>>>>>>>> b/lib/ethdev/ethdev_driver.c > > >>>>>>>>> index 0be1e8ca04..f994653fe9 100644 > > >>>>>>>>> --- a/lib/ethdev/ethdev_driver.c > > >>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c > > >>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct > > >>>>>>>>> rte_eth_dev > > >>>>>>>>> *dev, const char *ring_name, > > >>>>>>>>> return rc; > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> +void > > >>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) > > >>>>>>>>> +{ > > >>>>>>>>> + if (dev == NULL) > > >>>>>>>>> + return; > > >>>>>>>>> + eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, > > >>>>>>>>> dev); > > >>>>>>>>> +} > > >>>>>>>>> + > > >>>>>>>>> const struct rte_memzone * > > >>>>>>>>> rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const > > >>>>>>>>> char > > >>>>>>>>> *ring_name, > > >>>>>>>>> uint16_t queue_id, size_t size, unsigned int align, > > >>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h > > >>>>>>>>> b/lib/ethdev/ethdev_driver.h > > >>>>>>>>> index 2c9d615fb5..0d964d1f67 100644 > > >>>>>>>>> --- a/lib/ethdev/ethdev_driver.h > > >>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h > > >>>>>>>>> @@ -1621,6 +1621,16 @@ int > > >>>>>>>>> rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const > > >>>>>>>>> char *name, > > >>>>>>>>> uint16_t queue_id); > > >>>>>>>>> > > >>>>>>>>> +/** > > >>>>>>>>> + * @internal > > >>>>>>>>> + * Setup eth fast-path API to ethdev values. > > >>>>>>>>> + * > > >>>>>>>>> + * @param dev > > >>>>>>>>> + * Pointer to struct rte_eth_dev. > > >>>>>>>>> + */ > > >>>>>>>>> +__rte_internal > > >>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev); > > >>>>>>>>> + > > >>>>>>>>> /** > > >>>>>>>>> * @internal > > >>>>>>>>> * Atomically set the link status for the specific device. > > >>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h > > >>>>>>>>> index 049641d57c..44ee7229c1 100644 > > >>>>>>>>> --- a/lib/ethdev/rte_ethdev.h > > >>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h > > >>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type { > > >>>>>>>>> */ > > >>>>>>>>> RTE_ETH_EVENT_RX_AVAIL_THRESH, > > >>>>>>>>> /** Port recovering from a hardware or firmware error. > > >>>>>>>>> - * If PMD supports proactive error recovery, > > >>>>>>>>> - * it should trigger this event to notify application > > >>>>>>>>> - * that it detected an error and the recovery is being > > >>>>>>>>> started. > > >>>>>>>>> - * Upon receiving the event, the application should not > > >>>>>>>>> invoke > > >>>>>>>>> any control path API > > >>>>>>>>> - * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until > > >>>>>>>>> receiving > > >>>>>>>>> - * RTE_ETH_EVENT_RECOVERY_SUCCESS or > > >>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event. > > >>>>>>>>> - * The PMD will set the data path pointers to dummy > > >>>>>>>>> functions, > > >>>>>>>>> - * and re-set the data path pointers to non-dummy functions > > >>>>>>>>> - * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event. > > >>>>>>>>> - * It means that the application cannot send or receive any > > >>>>>>>>> packets > > >>>>>>>>> - * during this period. > > >>>>>>>>> + * > > >>>>>>>>> + * If PMD supports proactive error recovery, it should > > >>>>>>>>> trigger > > >>>>>>>>> this > > >>>>>>>>> + * event to notify application that it detected an error and > > >>>>>>>>> the > > >>>>>>>>> + * recovery is about to start. > > >>>>>>>>> + * > > >>>>>>>>> + * Upon receiving the event, the application should not > > >>>>>>>>> invoke any > > >>>>>>>>> + * control and data path API until receiving > > >>>>>>>>> + * RTE_ETH_EVENT_RECOVERY_SUCCESS or > > >>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED > > >>>>>>>>> + * event. > > >>>>>>>>> + * > > >>>>>>>>> + * Once this event is reported, the PMD will set the data > > >>>>>>>>> path > > >>>>>>>>> pointers > > >>>>>>>>> + * to dummy functions, and re-set the data path pointers to > > >>>>>>>>> valid > > >>>>>>>>> + * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS > > >>>>>>>>> event. > > >>>>>>>>> + * > > >>>>>>>>> * @note Before the PMD reports the recovery result, > > >>>>>>>>> * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event > > >>>>>>>>> again, > > >>>>>>>>> * because a larger error may occur during the recovery. > > >>>>>>>>> */ > > >>>>>>>>> RTE_ETH_EVENT_ERR_RECOVERING, > > >>>>>>>>> /** Port recovers successfully from the error. > > >>>>>>>>> - * The PMD already re-configured the port, > > >>>>>>>>> - * and the effect is the same as a restart operation. > > >>>>>>>>> + * > > >>>>>>>>> + * The PMD already re-configured the port: > > >>>>>>>>> * a) The following operation will be retained: > > >>>>>>>>> (alphabetically) > > >>>>>>>>> * - DCB configuration > > >>>>>>>>> * - FEC configuration > > >>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type { > > >>>>>>>>> * (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP) > > >>>>>>>>> * c) Any other configuration will not be stored > > >>>>>>>>> * and will need to be re-configured. > > >>>>>>>>> + * > > >>>>>>>>> + * The application should restore some additional > > >>>>>>>>> configuration > > >>>>>>>>> + * (see above case b/c), and then enable data path API > > >>>>>>>>> invocation. > > >>>>>>>>> */ > > >>>>>>>>> RTE_ETH_EVENT_RECOVERY_SUCCESS, > > >>>>>>>>> /** Port recovery failed. > > >>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map > > >>>>>>>>> index 357d1a88c0..c273e0bdae 100644 > > >>>>>>>>> --- a/lib/ethdev/version.map > > >>>>>>>>> +++ b/lib/ethdev/version.map > > >>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL { > > >>>>>>>>> rte_eth_devices; > > >>>>>>>>> rte_eth_dma_zone_free; > > >>>>>>>>> rte_eth_dma_zone_reserve; > > >>>>>>>>> + rte_eth_fp_ops_setup; > > >>>>>>>>> rte_eth_hairpin_queue_peer_bind; > > >>>>>>>>> rte_eth_hairpin_queue_peer_unbind; > > >>>>>>>>> rte_eth_hairpin_queue_peer_update; > > >>>>>>>>> -- > > >>>>>>>> Acked-by: Konstantin Ananyev <konstantin.anan...@huawei.com> > > >>>>>>>> > > >>>>>>>>> 2.17.1 > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>> > > >>>