Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode

fengchengwen Thu, 09 Mar 2023 03:31:05 -0800

On 2023/3/9 11:03, Honnappa Nagarahalli wrote:
> 
> 
>> -----Original Message-----
>> From: fengchengwen <fengcheng...@huawei.com>
>> Sent: Wednesday, March 8, 2023 7:00 PM
>> To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Konstantin
>> Ananyev <konstantin.v.anan...@yandex.ru>; dev@dpdk.org;
>> tho...@monjalon.net; Ferruh Yigit <ferruh.yi...@amd.com>; Andrew
>> Rybchenko <andrew.rybche...@oktetlabs.ru>; Kalesh AP <kalesh-
>> anakkur.pura...@broadcom.com>; Ajit Khaparde
>> (ajit.khapa...@broadcom.com) <ajit.khapa...@broadcom.com>
>> Cc: nd <n...@arm.com>
>> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error 
>> handling
>> mode
>>
>>
>>
>> On 2023/3/8 9:09, Honnappa Nagarahalli wrote:
>>> <snip>
>>>
>>>>>>>>>
>>>>>>>
>>>>>>> Is there any reason not to design this in the same way as
>>>>>> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
>>>>>>
>>>>>> I suppose it is a question for the authors of original patch...
>>>>> Appreciate if the authors could comment on this.
>>>>
>>>> The main cause is that the hardware implementation limit, I will try
>>>> to explain from hns3 PMD's view.
>>>> For a global reset, all the function need responsed within a centain
>>>> period of time. otherwise, the reset will fail. and also the reset
>>>> requirement a few steps (all may take a long time).
>>>>
>>>> When with multiple functions in one DPDK, and trigger a global reset,
>>>> the rte_eth_dev_reset will not cover this scene:
>>>> 1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt thread.
>>>> 2. then invoke application callback, but due to the same thread, and each
>>>>     port's recover will take a long time, so later port will reset failed.
> I am reading this again. What you are saying is, a single thread running the 
> recovery process in sequence for multiple ports will not meet the required 
> time limits. Hence, the recovery process needs to run in multiple threads 
> simultaneously. This way each thread could run the recovery for a different 
> port. Do I understand this correctly?

No
It's not realistic to have threads on every port.

> 
> (Assuming my understanding is correct) The current implementation is running 
> the recovery process in the context of data plane threads and not in the 
> interrupt thread. Is this correct?

No, the recovery process is running in the interrupt thread.

> 
>>> If the design were to introduce RTE_ETH_EVENT_INTR_RECOVER and
>> rte_eth_dev_recover, what problems do you see?
>>
>> I see the 'RTE_ETH_EVENT_INTR_RECOVER and rte_eth_dev_recover' has no
>> difference with RTE_ETH_EVENT_INTR_RESET mechanism.
>> Could you detail more?
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> We could have a similar API 'rte_eth_dev_recover' to do the
>>>>>>> recovery
>>>>>> functionality.
>>>>>>
>>>>>> I suppose such approach is also possible.
>>>>>> Personally I am fine with both ways: either existing one or what
>>>>>> you propose, as long as we'll fix existing race-condition.
>>>>>> What is good with what you suggest - that way we probably don't
>>>>>> need to worry how to allow user to enable/disable auto-recovery inside
>> PMD.
>>>>>>
>>>>>> Konstantin
>>>>>>
>>>>>
Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode

Reply via email to