On 3/7/2023 8:25 AM, fengchengwen wrote: > > > On 2023/3/6 19:13, Konstantin Ananyev wrote: >> >> >>>>>>>>>> In the proactive error handling mode, the PMD will set the data path >>>>>>>>>> pointers to dummy functions and then try recovery, in this period the >>>>>>>>>> application may still invoking data path API. This will introduce a >>>>>>>>>> race-condition with data path which may lead to crash [1]. >>>>>>>>>> >>>>>>>>>> Although the PMD added delay after setting data path pointers to >>>>>>>>>> cover >>>>>>>>>> the above race-condition, it reduces the probability, but it doesn't >>>>>>>>>> solve the problem. >>>>>>>>>> >>>>>>>>>> To solve the race-condition problem fundamentally, the following >>>>>>>>>> requirements are added: >>>>>>>>>> 1. The PMD should set the data path pointers to dummy functions after >>>>>>>>>> report RTE_ETH_EVENT_ERR_RECOVERING event. >>>>>>>>>> 2. The application should stop data path API invocation when process >>>>>>>>>> the RTE_ETH_EVENT_ERR_RECOVERING event. >>>>>>>>>> 3. The PMD should set the data path pointers to valid functions >>>>>>>>>> before >>>>>>>>>> report RTE_ETH_EVENT_RECOVERY_SUCCESS event. >>>>>>>>>> 4. The application should enable data path API invocation when >>>>>>>>>> process >>>>>>>>>> the RTE_ETH_EVENT_RECOVERY_SUCCESS event. >>>>>>>>>> >>>>>>>> >>>>>>>> How this is solving the race-condition, by pushing responsibility to >>>>>>>> stop data path to application? >>>>>>> >>>>>>> Exactly, it becomes application responsibility to make sure data-path is >>>>>>> stopped/suspended before recovery will continue. >>>>>>> >>>>>> >>>>>> From documentation of the feature: >>>>>> >>>>>> `` >>>>>> Because the PMD recovers automatically, >>>>>> the application can only sense that the data flow is disconnected for a >>>>>> while and the control API returns an error in this period. >>>>>> >>>>>> In order to sense the error happening/recovering, as well as to restore >>>>>> some additional configuration, three events are available: >>>>>> `` >>>>>> >>>>>> It looks like initial design is to use events mainly inform application >>>>>> about what happened and mainly for re-configuration. >>>>>> >>>>>> Although I am don't disagree to involve the application, I am not sure >>>>>> that is part of current design. >>>>> >>>>> I thought we all agreed that initial design contain some fallacies that >>>>> need to fixed, no? >>>>> Statement that with current rte_ethdev design error recovery can be done >>>>> without interaction with the app (to stop/suspend data/control path) >>>>> is the main one I think. >>>>> It needs some interaction with app layer, one way or another. >>>>> >>>>>>>> >>>>>>>> What if application is not interested in recovery modes at all and not >>>>>>>> registered any callback for the recovery? >>>>>>> >>>>>>> >>>>>>> Are you saying there is no way for application to disable >>>>>>> automatic recovery in PMD if it is not interested >>>>>>> (or can't full-fill per-requesties for it)? >>>>>>> If so, then yes it is a problem and we need to fix it. >>>>>>> I assumed that such mechanism to disable unwanted events already exists, >>>>>>> but I can't find anything. >>>>>>> Wonder what would be the easiest way here - can PMD make a decision >>>>>>> based on callback return value, or do we need a new API to >>>>>>> enable/disable callbacks, or ...? >>>>>>> >>>>>>> >>>>>> >>>>>> As far as I can see automatic recovery is not configurable by app. >>>>>> >>>>>> But that is not all, PMD sends events to application but PMD can't know >>>>>> if application is handling them or not, so with current design PMD can't >>>>>> rely on to app. >>>>> >>>>> Well, PMD invokes user provided callback. >>>>> One way to fix that problem - if there is no callback provided, >>>>> or callback returns an error code - PMD can assume that recovery >>>>> should not be done. >>>>> That is probably not the best design choice, but at least it will allow >>>>> to fix the problem without too many changes and introducing new API. >>>>> That could be sort of a 'quick fix'. >>>>> In a meanwhile we can think about new/better approach for that. >>>>> >>>> >>>> -rc2 for 23.03 is a few days away. >>>> >>>> What do you think to have 'quick fix' as modifying how driver updates >>>> burst ops to prevent the race condition, for this release? > > The 'quick fix', do you mean only update function pointer (without rxq > setting) ? > Currently the PMDs which announced support "proactive error handling mode" > already > do this. >
Yes. I checked hns3, it does as you said, hns3_eth_dev_fp_ops_config()' updates all fields in 'rte_eth_fp_ops' but only function pointer seems changed in the driver, resulting only function pointers to be updated. The discussion about race condition started with patch [1], which mentions a crash because of a race condition. Later in discussions, recovery event given as a sample for where the race can occur, that is why we are here. But after above info, although there is race condition and a bigger update (that needs application involvement) is required for recovery mechanism, there is no crash and NO 'quick fix' is required for recovery. @Konstantin, @Chengwen, can you please confirm above understanding is correct? [1] https://patches.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kal...@intel.com/ >>>> >>>> And plan a design update for the next release? >>> +1 on the overall approach. >> >> Yep, agree. > > Hope for better solution. > And also, I notice only the openvswitch (from all open-source software which > based-on DPDK) > registers RTE_ETH_EVENT_INTR_RESET callback . > > Therefore, hope we build a recovery framework at the DPDK SDK level and be > compatible > with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism. > >> >>> >>>> >>>> >>>>>> >>>>>>>> I think driver should not rely on application for this, unless >>>>>>>> application explicitly says (to driver) that it is handling recovery, >>>>>>>> right now there is no way for driver to know this. >>>>>>> >>>>>>> I think it is visa-versa: >>>>>>> application should not enable auto-recovery if it can't meet >>>>>>> per-requeststies for it (provide appropriate callback). >>>>>>> >>>>>> >>>>>> I agree on above, we are saying similar thing in different perspective. >>>>> >>>>> Ok, that's good we are on the same page. >>>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>>> Also, this patch introduce a driver internal function >>>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD. >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kal...@intel.com/ >>>>>>>>>> >>>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode") >>>>>>>>>> Cc: sta...@dpdk.org >>>>>>>>>> >>>>>>>>>> Signed-off-by: Chengwen Feng <fengcheng...@huawei.com> >>>>>>>>>> --- >>>>>>>>>> doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++--------- >>>>>>>>>> lib/ethdev/ethdev_driver.c | 8 +++++++ >>>>>>>>>> lib/ethdev/ethdev_driver.h | 10 ++++++++ >>>>>>>>>> lib/ethdev/rte_ethdev.h | 32 >>>>>>>>>> +++++++++++++++---------- >>>>>>>>>> lib/ethdev/version.map | 1 + >>>>>>>>>> 5 files changed, 46 insertions(+), 25 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst >>>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst >>>>>>>>>> index c145a9066c..e380ff135a 100644 >>>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst >>>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst >>>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery >>>>>>>>>> in PASSIVE mode, >>>>>>>>>> the PMD automatically recovers from error in PROACTIVE mode, >>>>>>>>>> and only a small amount of work is required for the application. >>>>>>>>>> >>>>>>>>>> -During error detection and automatic recovery, >>>>>>>>>> -the PMD sets the data path pointers to dummy functions >>>>>>>>>> -(which will prevent the crash), >>>>>>>>>> -and also make sure the control path operations fail with a return >>>>>>>>>> code ``-EBUSY``. >>>>>>>>>> - >>>>>>>>>> -Because the PMD recovers automatically, >>>>>>>>>> -the application can only sense that the data flow is disconnected >>>>>>>>>> for a while >>>>>>>>>> -and the control API returns an error in this period. >>>>>>>>>> +During error detection and automatic recovery, the PMD sets the >>>>>>>>>> data path >>>>>>>>>> +pointers to dummy functions and also make sure the control path >>>>>>>>>> operations >>>>>>>>>> +failed with a return code ``-EBUSY``. >>>>>>>>>> >>>>>>>>>> In order to sense the error happening/recovering, >>>>>>>>>> as well as to restore some additional configuration, >>>>>>>>>> @@ -653,9 +648,9 @@ three events are available: >>>>>>>>>> >>>>>>>>>> ``RTE_ETH_EVENT_ERR_RECOVERING`` >>>>>>>>>> Notify the application that an error is detected >>>>>>>>>> - and the recovery is being started. >>>>>>>>>> + and the recovery is about to start. >>>>>>>>>> Upon receiving the event, the application should not invoke >>>>>>>>>> - any control path function until receiving >>>>>>>>>> + any control and data path API until receiving >>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or >>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event. >>>>>>>>>> >>>>>>>>>> .. note:: >>>>>>>>>> @@ -666,8 +661,9 @@ three events are available: >>>>>>>>>> >>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` >>>>>>>>>> Notify the application that the recovery from error is >>>>>>>>>> successful, >>>>>>>>>> - the PMD already re-configures the port, >>>>>>>>>> - and the effect is the same as a restart operation. >>>>>>>>>> + the PMD already re-configures the port. >>>>>>>>>> + The application should restore some additional configuration, >>>>>>>>>> and then >>>>>>>>>> + enable data path API invocation. >>>>>>>>>> >>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` >>>>>>>>>> Notify the application that the recovery from error failed, >>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c >>>>>>>>>> index 0be1e8ca04..f994653fe9 100644 >>>>>>>>>> --- a/lib/ethdev/ethdev_driver.c >>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c >>>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev >>>>>>>>>> *dev, const char *ring_name, >>>>>>>>>> return rc; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> +void >>>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) >>>>>>>>>> +{ >>>>>>>>>> + if (dev == NULL) >>>>>>>>>> + return; >>>>>>>>>> + eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> const struct rte_memzone * >>>>>>>>>> rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char >>>>>>>>>> *ring_name, >>>>>>>>>> uint16_t queue_id, size_t size, unsigned int align, >>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h >>>>>>>>>> index 2c9d615fb5..0d964d1f67 100644 >>>>>>>>>> --- a/lib/ethdev/ethdev_driver.h >>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h >>>>>>>>>> @@ -1621,6 +1621,16 @@ int >>>>>>>>>> rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const >>>>>>>>>> char *name, >>>>>>>>>> uint16_t queue_id); >>>>>>>>>> >>>>>>>>>> +/** >>>>>>>>>> + * @internal >>>>>>>>>> + * Setup eth fast-path API to ethdev values. >>>>>>>>>> + * >>>>>>>>>> + * @param dev >>>>>>>>>> + * Pointer to struct rte_eth_dev. >>>>>>>>>> + */ >>>>>>>>>> +__rte_internal >>>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev); >>>>>>>>>> + >>>>>>>>>> /** >>>>>>>>>> * @internal >>>>>>>>>> * Atomically set the link status for the specific device. >>>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h >>>>>>>>>> index 049641d57c..44ee7229c1 100644 >>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h >>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h >>>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type { >>>>>>>>>> */ >>>>>>>>>> RTE_ETH_EVENT_RX_AVAIL_THRESH, >>>>>>>>>> /** Port recovering from a hardware or firmware error. >>>>>>>>>> - * If PMD supports proactive error recovery, >>>>>>>>>> - * it should trigger this event to notify application >>>>>>>>>> - * that it detected an error and the recovery is being started. >>>>>>>>>> - * Upon receiving the event, the application should not invoke >>>>>>>>>> any control path API >>>>>>>>>> - * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until >>>>>>>>>> receiving >>>>>>>>>> - * RTE_ETH_EVENT_RECOVERY_SUCCESS or >>>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event. >>>>>>>>>> - * The PMD will set the data path pointers to dummy functions, >>>>>>>>>> - * and re-set the data path pointers to non-dummy functions >>>>>>>>>> - * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event. >>>>>>>>>> - * It means that the application cannot send or receive any >>>>>>>>>> packets >>>>>>>>>> - * during this period. >>>>>>>>>> + * >>>>>>>>>> + * If PMD supports proactive error recovery, it should trigger >>>>>>>>>> this >>>>>>>>>> + * event to notify application that it detected an error and the >>>>>>>>>> + * recovery is about to start. >>>>>>>>>> + * >>>>>>>>>> + * Upon receiving the event, the application should not invoke >>>>>>>>>> any >>>>>>>>>> + * control and data path API until receiving >>>>>>>>>> + * RTE_ETH_EVENT_RECOVERY_SUCCESS or >>>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED >>>>>>>>>> + * event. >>>>>>>>>> + * >>>>>>>>>> + * Once this event is reported, the PMD will set the data path >>>>>>>>>> pointers >>>>>>>>>> + * to dummy functions, and re-set the data path pointers to >>>>>>>>>> valid >>>>>>>>>> + * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS >>>>>>>>>> event. >>>>>>>>>> + * >>>>>>>>>> * @note Before the PMD reports the recovery result, >>>>>>>>>> * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event >>>>>>>>>> again, >>>>>>>>>> * because a larger error may occur during the recovery. >>>>>>>>>> */ >>>>>>>>>> RTE_ETH_EVENT_ERR_RECOVERING, >>>>>>>>>> /** Port recovers successfully from the error. >>>>>>>>>> - * The PMD already re-configured the port, >>>>>>>>>> - * and the effect is the same as a restart operation. >>>>>>>>>> + * >>>>>>>>>> + * The PMD already re-configured the port: >>>>>>>>>> * a) The following operation will be retained: >>>>>>>>>> (alphabetically) >>>>>>>>>> * - DCB configuration >>>>>>>>>> * - FEC configuration >>>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type { >>>>>>>>>> * (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP) >>>>>>>>>> * c) Any other configuration will not be stored >>>>>>>>>> * and will need to be re-configured. >>>>>>>>>> + * >>>>>>>>>> + * The application should restore some additional configuration >>>>>>>>>> + * (see above case b/c), and then enable data path API >>>>>>>>>> invocation. >>>>>>>>>> */ >>>>>>>>>> RTE_ETH_EVENT_RECOVERY_SUCCESS, >>>>>>>>>> /** Port recovery failed. >>>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map >>>>>>>>>> index 357d1a88c0..c273e0bdae 100644 >>>>>>>>>> --- a/lib/ethdev/version.map >>>>>>>>>> +++ b/lib/ethdev/version.map >>>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL { >>>>>>>>>> rte_eth_devices; >>>>>>>>>> rte_eth_dma_zone_free; >>>>>>>>>> rte_eth_dma_zone_reserve; >>>>>>>>>> + rte_eth_fp_ops_setup; >>>>>>>>>> rte_eth_hairpin_queue_peer_bind; >>>>>>>>>> rte_eth_hairpin_queue_peer_unbind; >>>>>>>>>> rte_eth_hairpin_queue_peer_update; >>>>>>>>>> -- >>>>>>>>> Acked-by: Konstantin Ananyev <konstantin.anan...@huawei.com> >>>>>>>>> >>>>>>>>>> 2.17.1 >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>>