On 2023/3/7 13:34, Honnappa Nagarahalli wrote:
> 
> 
>> -----Original Message-----
>> From: Konstantin Ananyev <konstantin.v.anan...@yandex.ru>
>> Sent: Sunday, March 5, 2023 9:24 AM
>> To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>;
>> dev@dpdk.org; Chengwen Feng <fengcheng...@huawei.com>;
>> tho...@monjalon.net; Ferruh Yigit <ferruh.yi...@amd.com>; Andrew
>> Rybchenko <andrew.rybche...@oktetlabs.ru>; Kalesh AP <kalesh-
>> anakkur.pura...@broadcom.com>; Ajit Khaparde
>> (ajit.khapa...@broadcom.com) <ajit.khapa...@broadcom.com>
>> Cc: nd <n...@arm.com>
>> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error 
>> handling
>> mode
>>
>>
>>>>>>
>>>>>> In the proactive error handling mode, the PMD will set the data
>>>>>> path pointers to dummy functions and then try recovery, in this
>>>>>> period the application may still invoking data path API. This will
>>>>>> introduce a race-condition with data path which may lead to crash [1].
>>>>>>
>>>>>> Although the PMD added delay after setting data path pointers to
>>>>>> cover the above race-condition, it reduces the probability, but it
>>>>>> doesn't solve the problem.
>>>>>>
>>>>>> To solve the race-condition problem fundamentally, the following
>>>>>> requirements are added:
>>>>>> 1. The PMD should set the data path pointers to dummy functions after
>>>>>>      report RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>> Do you mean to say, PMD should set the data path pointers after
>>>>> calling the
>>>> call back function?
>>>>> The PMD is running in the context of multiple EAL threads. How do
>>>>> these
>>>> threads synchronize such that only one thread sets these data pointers?
>>>>
>>>> As I understand this event callback supposed to be called in the
>>>> context of EAL interrupt thread (whoever is more familiar with
>>>> original idea, feel free to correct me if I missed something).
>>> I could not figure this out. It looks to be called from the data plane 
>>> thread
>> context.
>>> I also have a thought on alternate design at the end, appreciate if you can
>> take a look.
>>>
>>>> How it is going to signal data-path threads that they need to
>>>> stop/suspend calling data-path API - that's I suppose is left to 
>>>> application
>> to decide...
>>>> Same as right now it is application responsibility to stop data-path
>>>> threads before doing dev_stop()/dev/_config()/etc.
>>> Ok, good, this expectation is not new. The application must have a
>> mechanism already.
>>>
>>>>
>>>>
>>>>>
>>>>>> 2. The application should stop data path API invocation when process
>>>>>>      the RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>> Any thoughts on how an application can do this?
>>> We can ignore this question as there is already similar expectation set for
>> earlier functionalities.
>>>
>>>>>
>>>>>> 3. The PMD should set the data path pointers to valid functions before
>>>>>>      report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>> 4. The application should enable data path API invocation when process
>>>>>>      the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>> Do you mean to say that the application should not call the datapath
>>>>> APIs
>>>> while the PMD is running the recovery process?
>>>>
>>>> Yes, I believe that's the intention.
>>> Ok, this is good and makes sense.
>>>
>>>>
>>>>>>
>>>>>> Also, this patch introduce a driver internal function
>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>>>>>
>>>>>> [1]
>>>>>>
>>>>
>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2
>>>>>> -
>>>>>> ashok.k.kal...@intel.com/
>>>>>>
>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>>>>>> Cc: sta...@dpdk.org
>>>>>>
>>>>>> Signed-off-by: Chengwen Feng <fengcheng...@huawei.com>
>>>>>> ---
>>>>>>    doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>>>>>    lib/ethdev/ethdev_driver.c              |  8 +++++++
>>>>>>    lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>>>>>    lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
>>>>>>    lib/ethdev/version.map                  |  1 +
>>>>>>    5 files changed, 46 insertions(+), 25 deletions(-)
>>>>>>
>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>> index c145a9066c..e380ff135a 100644
>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>> @@ -638,14 +638,9 @@ different from the application invokes
>>>>>> recovery in PASSIVE mode,  the PMD automatically recovers from
>>>>>> error in PROACTIVE mode,  and only a small amount of work is
>>>>>> required for the
>>>> application.
>>>>>>
>>>>>> -During error detection and automatic recovery, -the PMD sets the
>>>>>> data path pointers to dummy functions -(which will prevent the
>>>>>> crash), -and also make sure the control path operations fail with a
>>>>>> return
>>>> code ``-EBUSY``.
>>>>>> -
>>>>>> -Because the PMD recovers automatically, -the application can only
>>>>>> sense that the data flow is disconnected for a while -and the
>>>>>> control API returns an error in this period.
>>>>>> +During error detection and automatic recovery, the PMD sets the
>>>>>> +data path pointers to dummy functions and also make sure the
>>>>>> +control path operations failed with a return code ``-EBUSY``.
>>>>>>
>>>>>>    In order to sense the error happening/recovering,  as well as to
>>>>>> restore some additional configuration, @@ -653,9 +648,9 @@ three
>>>>>> events
>>>> are available:
>>>>>>
>>>>>>    ``RTE_ETH_EVENT_ERR_RECOVERING``
>>>>>>       Notify the application that an error is detected
>>>>>> -   and the recovery is being started.
>>>>>> +   and the recovery is about to start.
>>>>>>       Upon receiving the event, the application should not invoke
>>>>>> -   any control path function until receiving
>>>>>> +   any control and data path API until receiving
>>>>>>       ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>>>>>
>>>>>>    .. note::
>>>>>> @@ -666,8 +661,9 @@ three events are available:
>>>>>>
>>>>>>    ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>>>>>       Notify the application that the recovery from error is successful,
>>>>>> -   the PMD already re-configures the port,
>>>>>> -   and the effect is the same as a restart operation.
>>>>>> +   the PMD already re-configures the port.
>>>>>> +   The application should restore some additional configuration,
>>>>>> + and then
>>>>> What is the additional configuration? Is this specific to each NIC/PMD?
>>>>> I thought, this is an auto recovery process and the application does
>>>>> not require
>>>> to reconfigure anything. If the application has to restore the
>>>> configuration, how does auto recovery differ from typical recovery
>> process?
>>>>>
>>>>>> +   enable data path API invocation.
>>>>>>
>>>>>>    ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>>>>>       Notify the application that the recovery from error failed,
>>>>>> diff --git a/lib/ethdev/ethdev_driver.c
>>>>>> b/lib/ethdev/ethdev_driver.c index
>>>>>> 0be1e8ca04..f994653fe9 100644
>>>>>> --- a/lib/ethdev/ethdev_driver.c
>>>>>> +++ b/lib/ethdev/ethdev_driver.c
>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct
>> rte_eth_dev
>>>>>> *dev, const char *ring_name,
>>>>>>          return rc;
>>>>>>    }
>>>>>>
>>>>>> +void
>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) {
>>>>>> +        if (dev == NULL)
>>>>>> +                return;
>>>>>> +        eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); 
>>>>>> }
>>>>>> +
>>>>>>    const struct rte_memzone *
>>>>>>    rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const
>>>>>> char *ring_name,
>>>>>>                           uint16_t queue_id, size_t size, unsigned int 
>>>>>> align, diff
>> -
>>>> -git
>>>>>> a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
>>>>>> 2c9d615fb5..0d964d1f67 100644
>>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>>> @@ -1621,6 +1621,16 @@ int
>>>>>>    rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
>>>>>> char *name,
>>>>>>                   uint16_t queue_id);
>>>>>>
>>>>>> +/**
>>>>>> + * @internal
>>>>>> + * Setup eth fast-path API to ethdev values.
>>>>>> + *
>>>>>> + * @param dev
>>>>>> + *  Pointer to struct rte_eth_dev.
>>>>>> + */
>>>>>> +__rte_internal
>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>>>>>> +
>>>>>>    /**
>>>>>>     * @internal
>>>>>>     * Atomically set the link status for the specific device.
>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>>>>> index
>>>>>> 049641d57c..44ee7229c1 100644
>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>>>>>           */
>>>>>>          RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>>>>>          /** Port recovering from a hardware or firmware error.
>>>>>> -         * If PMD supports proactive error recovery,
>>>>>> -         * it should trigger this event to notify application
>>>>>> -         * that it detected an error and the recovery is being started.
>>>>>> -         * Upon receiving the event, the application should not invoke 
>>>>>> any
>>>>>> control path API
>>>>>> -         * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until 
>>>>>> receiving
>>>>>> -         * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
>>>>>> -         * The PMD will set the data path pointers to dummy functions,
>>>>>> -         * and re-set the data path pointers to non-dummy functions
>>>>>> -         * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>> -         * It means that the application cannot send or receive any 
>>>>>> packets
>>>>>> -         * during this period.
>>>>>> +         *
>>>>>> +         * If PMD supports proactive error recovery, it should trigger 
>>>>>> this
>>>>>> +         * event to notify application that it detected an error and the
>>>>>> +         * recovery is about to start.
>>>>>> +         *
>>>>>> +         * Upon receiving the event, the application should not invoke 
>>>>>> any
>>>>>> +         * control and data path API until receiving
>>>>>> +         * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED
>>>>>> +         * event.
>>>>>> +         *
>>>>>> +         * Once this event is reported, the PMD will set the data path 
>>>>>> pointers
>>>>>> +         * to dummy functions, and re-set the data path pointers to 
>>>>>> valid
>>>>>> +         * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
>>>>>> event.
>>>>> Why do we need to set the data path pointers to dummy functions if
>>>>> the
>>>> application is restricted from invoking any control and data path
>>>> APIs till the recovery process is completed?
>>>>
>>>> You are right, in theory it is not mandatory.
>>>> Though it helps to flag a problem if user will still try to call them
>>>> while recovery is in progress.
>>> Ok, may be in debug mode.
>>> I mean, we have already set an expectation to the application that it should
>> not call and the application has implemented a method to do the same. Why
>> do we need to complicate this?
>>> If the application calls the APIs, it is a programming error.
>>
>>
>> My preference would be to keep it this way for both debug and non-debug
>> mode.
>> It doesn't cost anything to us in terms of perfomance, but helps to catch
>> problems with wrong behaving app.
> 
> This is also causing a synchronization problem. i.e. if this has to be done 
> correctly, we need to use correct synchronization mechanisms.
> We cannot set the function pointers and assume that data will be visible to 
> other threads/cores in the correct order.
> A possible mechanism (though I see some problems with this) could be to use a 
> guard variable, which indicates when it is safe to use the function pointers 
> on the data plane threads. This would require a load-acquire in the data 
> plane threads.
> 
>>
>>>
>>>> Again, same as we doing in dev_stop().
>>>
>>>>
>>>>>
>>>>>> +         *
>>>>>>           * @note Before the PMD reports the recovery result,
>>>>>>           * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
>>>>>> again,
>>>>>>           * because a larger error may occur during the recovery.
>>>>>>           */
>>>>>>          RTE_ETH_EVENT_ERR_RECOVERING,
>>>>> I understand this is not a change in this patch. But, just
>>>>> wondering, what is the
>>>> purpose of this? How is the application supposed to use this?
>>>>>
>>>>>>          /** Port recovers successfully from the error.
>>>>>> -         * The PMD already re-configured the port,
>>>>>> -         * and the effect is the same as a restart operation.
>>>>>> +         *
>>>>>> +         * The PMD already re-configured the port:
>>>>>>           * a) The following operation will be retained: (alphabetically)
>>>>>>           *    - DCB configuration
>>>>>>           *    - FEC configuration
>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>>>>>           *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>>>>>           * c) Any other configuration will not be stored
>>>>>>           *    and will need to be re-configured.
>>>>>> +         *
>>>>>> +         * The application should restore some additional configuration
>>>>>> +         * (see above case b/c), and then enable data path API 
>>>>>> invocation.
>>>>>>           */
>>>>>>          RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>>>>>          /** Port recovery failed.
>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
>>>>>> 357d1a88c0..c273e0bdae 100644
>>>>>> --- a/lib/ethdev/version.map
>>>>>> +++ b/lib/ethdev/version.map
>>>>>> @@ -320,6 +320,7 @@ INTERNAL {
>>>>>>          rte_eth_devices;
>>>>>>          rte_eth_dma_zone_free;
>>>>>>          rte_eth_dma_zone_reserve;
>>>>>> +        rte_eth_fp_ops_setup;
>>>>>>          rte_eth_hairpin_queue_peer_bind;
>>>>>>          rte_eth_hairpin_queue_peer_unbind;
>>>>>>          rte_eth_hairpin_queue_peer_update;
>>>>>> --
>>>>>> 2.17.1
>>>>>
>>>
>>> Is there any reason not to design this in the same way as
>> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
>>
>> I suppose it is a question for the authors of original patch...
> Appreciate if the authors could comment on this.

The main cause is that the hardware implementation limit, I will try to
explain from hns3 PMD's view.
For a global reset, all the function need responsed within a centain period
of time. otherwise, the reset will fail. and also the reset requirement a few
steps (all may take a long time).

When with multiple functions in one DPDK, and trigger a global reset, the
rte_eth_dev_reset will not cover this scene:
1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt thread.
2. then invoke application callback, but due to the same thread, and each
    port's recover will take a long time, so later port will reset failed.

> 
>>
>>> We could have a similar API 'rte_eth_dev_recover' to do the recovery
>> functionality.
>>
>> I suppose such approach is also possible.
>> Personally I am fine with both ways: either existing one or what you propose,
>> as long as we'll fix existing race-condition.
>> What is good with what you suggest - that way we probably don't need to
>> worry how to allow user to enable/disable auto-recovery inside PMD.
>>
>> Konstantin
>>
> 

Reply via email to