RE: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode

Honnappa Nagarahalli Thu, 09 Mar 2023 19:25:58 -0800


> -----Original Message-----
> From: fengchengwen <[email protected]>
> Sent: Thursday, March 9, 2023 5:31 AM
> To: Honnappa Nagarahalli <[email protected]>; Konstantin
> Ananyev <[email protected]>; [email protected];
> [email protected]; Ferruh Yigit <[email protected]>; Andrew
> Rybchenko <[email protected]>; Kalesh AP <kalesh-
> [email protected]>; Ajit Khaparde
> ([email protected]) <[email protected]>
> Cc: nd <[email protected]>
> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error 
> handling
> mode
> 
> 
> 
> On 2023/3/9 11:03, Honnappa Nagarahalli wrote:
> >
> >
> >> -----Original Message-----
> >> From: fengchengwen <[email protected]>
> >> Sent: Wednesday, March 8, 2023 7:00 PM
> >> To: Honnappa Nagarahalli <[email protected]>;
> Konstantin
> >> Ananyev <[email protected]>; [email protected];
> >> [email protected]; Ferruh Yigit <[email protected]>; Andrew
> >> Rybchenko <[email protected]>; Kalesh AP <kalesh-
> >> [email protected]>; Ajit Khaparde
> >> ([email protected]) <[email protected]>
> >> Cc: nd <[email protected]>
> >> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive
> >> error handling mode
> >>
> >>
> >>
> >> On 2023/3/8 9:09, Honnappa Nagarahalli wrote:
> >>> <snip>
> >>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>> Is there any reason not to design this in the same way as
> >>>>>> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
> >>>>>>
> >>>>>> I suppose it is a question for the authors of original patch...
> >>>>> Appreciate if the authors could comment on this.
> >>>>
> >>>> The main cause is that the hardware implementation limit, I will
> >>>> try to explain from hns3 PMD's view.
> >>>> For a global reset, all the function need responsed within a
> >>>> centain period of time. otherwise, the reset will fail. and also
> >>>> the reset requirement a few steps (all may take a long time).
> >>>>
> >>>> When with multiple functions in one DPDK, and trigger a global
> >>>> reset, the rte_eth_dev_reset will not cover this scene:
> >>>> 1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt
> thread.
> >>>> 2. then invoke application callback, but due to the same thread, and
> each
> >>>>     port's recover will take a long time, so later port will reset 
> >>>> failed.
> > I am reading this again. What you are saying is, a single thread running the
> recovery process in sequence for multiple ports will not meet the required
> time limits. Hence, the recovery process needs to run in multiple threads
> simultaneously. This way each thread could run the recovery for a different
> port. Do I understand this correctly?
> 
> No
> It's not realistic to have threads on every port.
> 
> >
> > (Assuming my understanding is correct) The current implementation is
> running the recovery process in the context of data plane threads and not in
> the interrupt thread. Is this correct?
> 
> No, the recovery process is running in the interrupt thread.
Ok.


> 
> >
> >>> If the design were to introduce RTE_ETH_EVENT_INTR_RECOVER and
> >> rte_eth_dev_recover, what problems do you see?
> >>
> >> I see the 'RTE_ETH_EVENT_INTR_RECOVER and rte_eth_dev_recover' has
> no
> >> difference with RTE_ETH_EVENT_INTR_RESET mechanism.
> >> Could you detail more?
They are similar. i.e. we use RTE_ETH_EVENT_INTR_RECOVER to indicate that it is 
a recovery interrupt (not a reset event). The recovery process is called 
through new rte_eth_dev_recover API. What problems do you see with it?
I am unable to understand the problems you have described above.

> >>
> >>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>>> We could have a similar API 'rte_eth_dev_recover' to do the
> >>>>>>> recovery
> >>>>>> functionality.
> >>>>>>
> >>>>>> I suppose such approach is also possible.
> >>>>>> Personally I am fine with both ways: either existing one or what
> >>>>>> you propose, as long as we'll fix existing race-condition.
> >>>>>> What is good with what you suggest - that way we probably don't
> >>>>>> need to worry how to allow user to enable/disable auto-recovery
> >>>>>> inside
> >> PMD.
> >>>>>>
> >>>>>> Konstantin
> >>>>>>
> >>>>>

RE: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode

Reply via email to