> -----Original Message----- > From: fengchengwen <fengcheng...@huawei.com> > Sent: Thursday, March 9, 2023 5:31 AM > To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Konstantin > Ananyev <konstantin.v.anan...@yandex.ru>; dev@dpdk.org; > tho...@monjalon.net; Ferruh Yigit <ferruh.yi...@amd.com>; Andrew > Rybchenko <andrew.rybche...@oktetlabs.ru>; Kalesh AP <kalesh- > anakkur.pura...@broadcom.com>; Ajit Khaparde > (ajit.khapa...@broadcom.com) <ajit.khapa...@broadcom.com> > Cc: nd <n...@arm.com> > Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error > handling > mode > > > > On 2023/3/9 11:03, Honnappa Nagarahalli wrote: > > > > > >> -----Original Message----- > >> From: fengchengwen <fengcheng...@huawei.com> > >> Sent: Wednesday, March 8, 2023 7:00 PM > >> To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; > Konstantin > >> Ananyev <konstantin.v.anan...@yandex.ru>; dev@dpdk.org; > >> tho...@monjalon.net; Ferruh Yigit <ferruh.yi...@amd.com>; Andrew > >> Rybchenko <andrew.rybche...@oktetlabs.ru>; Kalesh AP <kalesh- > >> anakkur.pura...@broadcom.com>; Ajit Khaparde > >> (ajit.khapa...@broadcom.com) <ajit.khapa...@broadcom.com> > >> Cc: nd <n...@arm.com> > >> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive > >> error handling mode > >> > >> > >> > >> On 2023/3/8 9:09, Honnappa Nagarahalli wrote: > >>> <snip> > >>> > >>>>>>>>> > >>>>>>> > >>>>>>> Is there any reason not to design this in the same way as > >>>>>> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself? > >>>>>> > >>>>>> I suppose it is a question for the authors of original patch... > >>>>> Appreciate if the authors could comment on this. > >>>> > >>>> The main cause is that the hardware implementation limit, I will > >>>> try to explain from hns3 PMD's view. > >>>> For a global reset, all the function need responsed within a > >>>> centain period of time. otherwise, the reset will fail. and also > >>>> the reset requirement a few steps (all may take a long time). > >>>> > >>>> When with multiple functions in one DPDK, and trigger a global > >>>> reset, the rte_eth_dev_reset will not cover this scene: > >>>> 1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt > thread. > >>>> 2. then invoke application callback, but due to the same thread, and > each > >>>> port's recover will take a long time, so later port will reset > >>>> failed. > > I am reading this again. What you are saying is, a single thread running the > recovery process in sequence for multiple ports will not meet the required > time limits. Hence, the recovery process needs to run in multiple threads > simultaneously. This way each thread could run the recovery for a different > port. Do I understand this correctly? > > No > It's not realistic to have threads on every port. > > > > > (Assuming my understanding is correct) The current implementation is > running the recovery process in the context of data plane threads and not in > the interrupt thread. Is this correct? > > No, the recovery process is running in the interrupt thread. Ok.
> > > > >>> If the design were to introduce RTE_ETH_EVENT_INTR_RECOVER and > >> rte_eth_dev_recover, what problems do you see? > >> > >> I see the 'RTE_ETH_EVENT_INTR_RECOVER and rte_eth_dev_recover' has > no > >> difference with RTE_ETH_EVENT_INTR_RESET mechanism. > >> Could you detail more? They are similar. i.e. we use RTE_ETH_EVENT_INTR_RECOVER to indicate that it is a recovery interrupt (not a reset event). The recovery process is called through new rte_eth_dev_recover API. What problems do you see with it? I am unable to understand the problems you have described above. > >> > >>> > >>>> > >>>>> > >>>>>> > >>>>>>> We could have a similar API 'rte_eth_dev_recover' to do the > >>>>>>> recovery > >>>>>> functionality. > >>>>>> > >>>>>> I suppose such approach is also possible. > >>>>>> Personally I am fine with both ways: either existing one or what > >>>>>> you propose, as long as we'll fix existing race-condition. > >>>>>> What is good with what you suggest - that way we probably don't > >>>>>> need to worry how to allow user to enable/disable auto-recovery > >>>>>> inside > >> PMD. > >>>>>> > >>>>>> Konstantin > >>>>>> > >>>>>