On 2023/3/7 20:07, Ferruh Yigit wrote:
> On 3/7/2023 8:25 AM, fengchengwen wrote:
>>
>>
>> On 2023/3/6 19:13, Konstantin Ananyev wrote:
>>>
>>>
>>>>>>>>>>> In the proactive error handling mode, the PMD will set the data path
>>>>>>>>>>> pointers to dummy functions and then try recovery, in this period 
>>>>>>>>>>> the
>>>>>>>>>>> application may still invoking data path API. This will introduce a
>>>>>>>>>>> race-condition with data path which may lead to crash [1].
>>>>>>>>>>>
>>>>>>>>>>> Although the PMD added delay after setting data path pointers to 
>>>>>>>>>>> cover
>>>>>>>>>>> the above race-condition, it reduces the probability, but it doesn't
>>>>>>>>>>> solve the problem.
>>>>>>>>>>>
>>>>>>>>>>> To solve the race-condition problem fundamentally, the following
>>>>>>>>>>> requirements are added:
>>>>>>>>>>> 1. The PMD should set the data path pointers to dummy functions 
>>>>>>>>>>> after
>>>>>>>>>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>>>>>>>> 2. The application should stop data path API invocation when process
>>>>>>>>>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>>>>>>>> 3. The PMD should set the data path pointers to valid functions 
>>>>>>>>>>> before
>>>>>>>>>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>>>>>>> 4. The application should enable data path API invocation when 
>>>>>>>>>>> process
>>>>>>>>>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> How this is solving the race-condition, by pushing responsibility to
>>>>>>>>> stop data path to application?
>>>>>>>>
>>>>>>>> Exactly, it becomes application responsibility to make sure data-path 
>>>>>>>> is
>>>>>>>> stopped/suspended before recovery will continue.
>>>>>>>>
>>>>>>>
>>>>>>> From documentation of the feature:
>>>>>>>
>>>>>>> ``
>>>>>>> Because the PMD recovers automatically,
>>>>>>> the application can only sense that the data flow is disconnected for a
>>>>>>> while and the control API returns an error in this period.
>>>>>>>
>>>>>>> In order to sense the error happening/recovering, as well as to restore
>>>>>>> some additional configuration, three events are available:
>>>>>>> ``
>>>>>>>
>>>>>>> It looks like initial design is to use events mainly inform application
>>>>>>> about what happened and mainly for re-configuration.
>>>>>>>
>>>>>>> Although I am don't disagree to involve the application, I am not sure
>>>>>>> that is part of current design.
>>>>>>
>>>>>> I thought we all agreed that initial design contain some fallacies that
>>>>>> need to fixed, no?
>>>>>> Statement that with current rte_ethdev design error recovery can be done
>>>>>> without interaction with the app (to stop/suspend data/control path)
>>>>>> is the main one I think.
>>>>>> It needs some interaction with app layer, one way or another.
>>>>>>
>>>>>>>>>
>>>>>>>>> What if application is not interested in recovery modes at all and not
>>>>>>>>> registered any callback for the recovery?
>>>>>>>>
>>>>>>>>
>>>>>>>> Are you saying there is no way for application to disable
>>>>>>>> automatic recovery in PMD if it is not interested
>>>>>>>> (or can't full-fill per-requesties for it)?
>>>>>>>> If so, then yes it is a problem and we need to fix it.
>>>>>>>> I assumed that such mechanism to disable unwanted events already 
>>>>>>>> exists,
>>>>>>>> but I can't find anything.
>>>>>>>> Wonder what would be the easiest way here - can PMD make a decision
>>>>>>>> based on callback return value, or do we need a new API to
>>>>>>>> enable/disable callbacks, or ...?
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> As far as I can see automatic recovery is not configurable by app.
>>>>>>>
>>>>>>> But that is not all, PMD sends events to application but PMD can't know
>>>>>>> if application is handling them or not, so with current design PMD can't
>>>>>>> rely on to app.
>>>>>>
>>>>>> Well, PMD invokes user provided callback.
>>>>>> One way to fix that problem - if there is no callback provided,
>>>>>> or callback returns an error code - PMD can assume that recovery
>>>>>> should not be done.
>>>>>> That is probably not the best design choice, but at least it will allow
>>>>>> to fix the problem without too many changes and introducing new API.
>>>>>> That could be sort of a 'quick fix'.
>>>>>> In a meanwhile we can think about new/better approach for that.
>>>>>>
>>>>>
>>>>> -rc2 for 23.03 is a few days away.
>>>>>
>>>>> What do you think to have 'quick fix' as modifying how driver updates
>>>>> burst ops to prevent the race condition, for this release?
>>
>> The 'quick fix', do you mean only update function pointer (without rxq 
>> setting) ?
>> Currently the PMDs which announced support "proactive error handling mode" 
>> already
>> do this.
>>
> 
> Yes.
> I checked hns3, it does as you said, hns3_eth_dev_fp_ops_config()'
> updates all fields in 'rte_eth_fp_ops' but only function pointer seems
> changed in the driver, resulting only function pointers to be updated.
> 
> The discussion about race condition started with patch [1], which
> mentions a crash because of a race condition. Later in discussions,
> recovery event given as a sample for where the race can occur, that is
> why we are here.
> 
> But after above info, although there is race condition and a bigger
> update (that needs application involvement) is required for recovery
> mechanism, there is no crash and NO 'quick fix' is required for recovery.
> 
> @Konstantin, @Chengwen, can you please confirm above understanding is
> correct?

Yes, that's what.

> 
> 
> 
> [1]
> https://patches.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kal...@intel.com/
> 
>>>>>
>>>>> And plan a design update for the next release?
>>>> +1 on the overall approach.
>>>
>>> Yep, agree.
>>
>> Hope for better solution.
>> And also, I notice only the openvswitch (from all open-source software which 
>> based-on DPDK)
>> registers RTE_ETH_EVENT_INTR_RESET callback .
>>
>> Therefore, hope we build a recovery framework at the DPDK SDK level and be 
>> compatible
>> with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism.
>>
>>>  
>>>>
>>>>>
>>>>>
>>>>>>>
>>>>>>>>> I think driver should not rely on application for this, unless
>>>>>>>>> application explicitly says (to driver) that it is handling recovery,
>>>>>>>>> right now there is no way for driver to know this.
>>>>>>>>
>>>>>>>> I think it is visa-versa:
>>>>>>>> application should not enable auto-recovery if it can't meet
>>>>>>>> per-requeststies for it (provide appropriate callback).
>>>>>>>>
>>>>>>>
>>>>>>> I agree on above, we are saying similar thing in different perspective.
>>>>>>
>>>>>> Ok, that's good we are on the same page.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Also, this patch introduce a driver internal function
>>>>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kal...@intel.com/
>>>>>>>>>>>
>>>>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>>>>>>>>>>> Cc: sta...@dpdk.org
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Chengwen Feng <fengcheng...@huawei.com>
>>>>>>>>>>> ---
>>>>>>>>>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>>>>>>>>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
>>>>>>>>>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>>>>>>>>>>   lib/ethdev/rte_ethdev.h                 | 32
>>>>>>>>>>> +++++++++++++++----------
>>>>>>>>>>>   lib/ethdev/version.map                  |  1 +
>>>>>>>>>>>   5 files changed, 46 insertions(+), 25 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>>>> index c145a9066c..e380ff135a 100644
>>>>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
>>>>>>>>>>> in PASSIVE mode,
>>>>>>>>>>>   the PMD automatically recovers from error in PROACTIVE mode,
>>>>>>>>>>>   and only a small amount of work is required for the application.
>>>>>>>>>>>
>>>>>>>>>>> -During error detection and automatic recovery,
>>>>>>>>>>> -the PMD sets the data path pointers to dummy functions
>>>>>>>>>>> -(which will prevent the crash),
>>>>>>>>>>> -and also make sure the control path operations fail with a return
>>>>>>>>>>> code ``-EBUSY``.
>>>>>>>>>>> -
>>>>>>>>>>> -Because the PMD recovers automatically,
>>>>>>>>>>> -the application can only sense that the data flow is disconnected
>>>>>>>>>>> for a while
>>>>>>>>>>> -and the control API returns an error in this period.
>>>>>>>>>>> +During error detection and automatic recovery, the PMD sets the
>>>>>>>>>>> data path
>>>>>>>>>>> +pointers to dummy functions and also make sure the control path
>>>>>>>>>>> operations
>>>>>>>>>>> +failed with a return code ``-EBUSY``.
>>>>>>>>>>>
>>>>>>>>>>>   In order to sense the error happening/recovering,
>>>>>>>>>>>   as well as to restore some additional configuration,
>>>>>>>>>>> @@ -653,9 +648,9 @@ three events are available:
>>>>>>>>>>>
>>>>>>>>>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
>>>>>>>>>>>      Notify the application that an error is detected
>>>>>>>>>>> -   and the recovery is being started.
>>>>>>>>>>> +   and the recovery is about to start.
>>>>>>>>>>>      Upon receiving the event, the application should not invoke
>>>>>>>>>>> -   any control path function until receiving
>>>>>>>>>>> +   any control and data path API until receiving
>>>>>>>>>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
>>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>>>>>>>>>>
>>>>>>>>>>>   .. note::
>>>>>>>>>>> @@ -666,8 +661,9 @@ three events are available:
>>>>>>>>>>>
>>>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>>>>>>>>>>      Notify the application that the recovery from error is 
>>>>>>>>>>> successful,
>>>>>>>>>>> -   the PMD already re-configures the port,
>>>>>>>>>>> -   and the effect is the same as a restart operation.
>>>>>>>>>>> +   the PMD already re-configures the port.
>>>>>>>>>>> +   The application should restore some additional configuration,
>>>>>>>>>>> and then
>>>>>>>>>>> +   enable data path API invocation.
>>>>>>>>>>>
>>>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>>>>>>>>>>      Notify the application that the recovery from error failed,
>>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
>>>>>>>>>>> index 0be1e8ca04..f994653fe9 100644
>>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.c
>>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c
>>>>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
>>>>>>>>>>> *dev, const char *ring_name,
>>>>>>>>>>>       return rc;
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>> +void
>>>>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
>>>>>>>>>>> +{
>>>>>>>>>>> +    if (dev == NULL)
>>>>>>>>>>> +        return;
>>>>>>>>>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>>   const struct rte_memzone *
>>>>>>>>>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
>>>>>>>>>>> *ring_name,
>>>>>>>>>>>                uint16_t queue_id, size_t size, unsigned int align,
>>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>>>>>>>>>>> index 2c9d615fb5..0d964d1f67 100644
>>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>>>>>>>> @@ -1621,6 +1621,16 @@ int
>>>>>>>>>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
>>>>>>>>>>> char *name,
>>>>>>>>>>>            uint16_t queue_id);
>>>>>>>>>>>
>>>>>>>>>>> +/**
>>>>>>>>>>> + * @internal
>>>>>>>>>>> + * Setup eth fast-path API to ethdev values.
>>>>>>>>>>> + *
>>>>>>>>>>> + * @param dev
>>>>>>>>>>> + *  Pointer to struct rte_eth_dev.
>>>>>>>>>>> + */
>>>>>>>>>>> +__rte_internal
>>>>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>>>>>>>>>>> +
>>>>>>>>>>>   /**
>>>>>>>>>>>    * @internal
>>>>>>>>>>>    * Atomically set the link status for the specific device.
>>>>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>>>>>>>>>> index 049641d57c..44ee7229c1 100644
>>>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>>>>>>>>>>        */
>>>>>>>>>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>>>>>>>>>>       /** Port recovering from a hardware or firmware error.
>>>>>>>>>>> -     * If PMD supports proactive error recovery,
>>>>>>>>>>> -     * it should trigger this event to notify application
>>>>>>>>>>> -     * that it detected an error and the recovery is being started.
>>>>>>>>>>> -     * Upon receiving the event, the application should not invoke
>>>>>>>>>>> any control path API
>>>>>>>>>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
>>>>>>>>>>> receiving
>>>>>>>>>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>>>>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
>>>>>>>>>>> -     * The PMD will set the data path pointers to dummy functions,
>>>>>>>>>>> -     * and re-set the data path pointers to non-dummy functions
>>>>>>>>>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>>>>>>> -     * It means that the application cannot send or receive any
>>>>>>>>>>> packets
>>>>>>>>>>> -     * during this period.
>>>>>>>>>>> +     *
>>>>>>>>>>> +     * If PMD supports proactive error recovery, it should trigger
>>>>>>>>>>> this
>>>>>>>>>>> +     * event to notify application that it detected an error and 
>>>>>>>>>>> the
>>>>>>>>>>> +     * recovery is about to start.
>>>>>>>>>>> +     *
>>>>>>>>>>> +     * Upon receiving the event, the application should not invoke 
>>>>>>>>>>> any
>>>>>>>>>>> +     * control and data path API until receiving
>>>>>>>>>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or 
>>>>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED
>>>>>>>>>>> +     * event.
>>>>>>>>>>> +     *
>>>>>>>>>>> +     * Once this event is reported, the PMD will set the data path
>>>>>>>>>>> pointers
>>>>>>>>>>> +     * to dummy functions, and re-set the data path pointers to 
>>>>>>>>>>> valid
>>>>>>>>>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
>>>>>>>>>>> event.
>>>>>>>>>>> +     *
>>>>>>>>>>>        * @note Before the PMD reports the recovery result,
>>>>>>>>>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
>>>>>>>>>>> again,
>>>>>>>>>>>        * because a larger error may occur during the recovery.
>>>>>>>>>>>        */
>>>>>>>>>>>       RTE_ETH_EVENT_ERR_RECOVERING,
>>>>>>>>>>>       /** Port recovers successfully from the error.
>>>>>>>>>>> -     * The PMD already re-configured the port,
>>>>>>>>>>> -     * and the effect is the same as a restart operation.
>>>>>>>>>>> +     *
>>>>>>>>>>> +     * The PMD already re-configured the port:
>>>>>>>>>>>        * a) The following operation will be retained: 
>>>>>>>>>>> (alphabetically)
>>>>>>>>>>>        *    - DCB configuration
>>>>>>>>>>>        *    - FEC configuration
>>>>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>>>>>>>>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>>>>>>>>>>        * c) Any other configuration will not be stored
>>>>>>>>>>>        *    and will need to be re-configured.
>>>>>>>>>>> +     *
>>>>>>>>>>> +     * The application should restore some additional configuration
>>>>>>>>>>> +     * (see above case b/c), and then enable data path API 
>>>>>>>>>>> invocation.
>>>>>>>>>>>        */
>>>>>>>>>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>>>>>>>>>>       /** Port recovery failed.
>>>>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
>>>>>>>>>>> index 357d1a88c0..c273e0bdae 100644
>>>>>>>>>>> --- a/lib/ethdev/version.map
>>>>>>>>>>> +++ b/lib/ethdev/version.map
>>>>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL {
>>>>>>>>>>>       rte_eth_devices;
>>>>>>>>>>>       rte_eth_dma_zone_free;
>>>>>>>>>>>       rte_eth_dma_zone_reserve;
>>>>>>>>>>> +    rte_eth_fp_ops_setup;
>>>>>>>>>>>       rte_eth_hairpin_queue_peer_bind;
>>>>>>>>>>>       rte_eth_hairpin_queue_peer_unbind;
>>>>>>>>>>>       rte_eth_hairpin_queue_peer_update;
>>>>>>>>>>> --
>>>>>>>>>>   Acked-by: Konstantin Ananyev <konstantin.anan...@huawei.com>
>>>>>>>>>>
>>>>>>>>>>> 2.17.1
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
> 
> .
> 

Reply via email to