> -----Original Message----- > From: fengchengwen <fengcheng...@huawei.com> > Sent: Wednesday, March 8, 2023 7:00 PM > To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Konstantin > Ananyev <konstantin.v.anan...@yandex.ru>; dev@dpdk.org; > tho...@monjalon.net; Ferruh Yigit <ferruh.yi...@amd.com>; Andrew > Rybchenko <andrew.rybche...@oktetlabs.ru>; Kalesh AP <kalesh- > anakkur.pura...@broadcom.com>; Ajit Khaparde > (ajit.khapa...@broadcom.com) <ajit.khapa...@broadcom.com> > Cc: nd <n...@arm.com> > Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error > handling > mode > > > > On 2023/3/8 9:09, Honnappa Nagarahalli wrote: > > <snip> > > > >>>>>>> > >>>>> > >>>>> Is there any reason not to design this in the same way as > >>>> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself? > >>>> > >>>> I suppose it is a question for the authors of original patch... > >>> Appreciate if the authors could comment on this. > >> > >> The main cause is that the hardware implementation limit, I will try > >> to explain from hns3 PMD's view. > >> For a global reset, all the function need responsed within a centain > >> period of time. otherwise, the reset will fail. and also the reset > >> requirement a few steps (all may take a long time). > >> > >> When with multiple functions in one DPDK, and trigger a global reset, > >> the rte_eth_dev_reset will not cover this scene: > >> 1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt thread. > >> 2. then invoke application callback, but due to the same thread, and each > >> port's recover will take a long time, so later port will reset failed. I am reading this again. What you are saying is, a single thread running the recovery process in sequence for multiple ports will not meet the required time limits. Hence, the recovery process needs to run in multiple threads simultaneously. This way each thread could run the recovery for a different port. Do I understand this correctly?
(Assuming my understanding is correct) The current implementation is running the recovery process in the context of data plane threads and not in the interrupt thread. Is this correct? > > If the design were to introduce RTE_ETH_EVENT_INTR_RECOVER and > rte_eth_dev_recover, what problems do you see? > > I see the 'RTE_ETH_EVENT_INTR_RECOVER and rte_eth_dev_recover' has no > difference with RTE_ETH_EVENT_INTR_RESET mechanism. > Could you detail more? > > > > >> > >>> > >>>> > >>>>> We could have a similar API 'rte_eth_dev_recover' to do the > >>>>> recovery > >>>> functionality. > >>>> > >>>> I suppose such approach is also possible. > >>>> Personally I am fine with both ways: either existing one or what > >>>> you propose, as long as we'll fix existing race-condition. > >>>> What is good with what you suggest - that way we probably don't > >>>> need to worry how to allow user to enable/disable auto-recovery inside > PMD. > >>>> > >>>> Konstantin > >>>> > >>>