Chengwen Feng <fengcheng...@huawei.com> writes:
> From: Kalesh AP <kalesh-anakkur.pura...@broadcom.com> > > Some PMDs (e.g. hns3) could detect hardware or firmware errors, and try > to recover from the errors. In this process, the PMD sets the data path > pointers to dummy functions (which will prevent the crash), and also > make sure the control path operations failed with retcode -EBUSY. > > Also in this process, from the perspective of application, services are > affected. For example, the Rx/Tx bust APIs cannot receive and send > packets, and the control plane API return failure. > > In some service scenarios, application needs to be aware of the event > to determine whether to migrate services. So three events were > introduced: > > 1. RTE_ETH_EVENT_ERR_RECOVERING: the PMD must trigger this event to > notify the application that it detected a hardware or firmware error > and tries to recover. > 2. RTE_ETH_EVENT_RECOVER_SUCCESS: the PMD must trigger this event to > notify the application that it has recovered from the error. And PMD > already re-configures the port to the state prior to the error. > 3. RTE_ETH_EVENT_RECOVER_FAILED: the PMD must trigger this event to > notify the application that it has failed to recover from the error. > The port may not be usable anymore. > > Note: the error recovery of these events is mainly performed by the > PMD. Unlike the RTE_ETH_EVENT_INTR_RESET which the error recovery is > performed by the application. The PMD must ensure that the above two > error handling methods cannot be used at the same time. > > Signed-off-by: Kalesh AP <kalesh-anakkur.pura...@broadcom.com> > Signed-off-by: Somnath Kotur <somnath.ko...@broadcom.com> > Signed-off-by: Chengwen Feng <fengcheng...@huawei.com> > Reviewed-by: Ajit Khaparde <ajit.khapa...@broadcom.com> > --- > doc/guides/prog_guide/poll_mode_drv.rst | 32 +++++++++++++++++++++++++ > doc/guides/rel_notes/release_22_07.rst | 11 +++++++++ > lib/ethdev/rte_ethdev.h | 6 +++++ > 3 files changed, 49 insertions(+) > > diff --git a/doc/guides/prog_guide/poll_mode_drv.rst > b/doc/guides/prog_guide/poll_mode_drv.rst > index 9d081b1cba..6398917485 100644 > --- a/doc/guides/prog_guide/poll_mode_drv.rst > +++ b/doc/guides/prog_guide/poll_mode_drv.rst > @@ -627,3 +627,35 @@ by application. > The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger > the application to handle reset event. It is duty of application to > handle all synchronization before it calls rte_eth_dev_reset(). > + > +Error Recovery Notification > +~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Some PMDs (e.g. hns3) could detect hardware or firmware errors, and try to > +recover from the errors. In this process, the PMD sets the data path pointers > +to dummy functions (which will prevent the crash), and also make sure the > +control path operations failed with retcode -EBUSY. > + > +Also in this process, from the perspective of application, services are > +affected. For example, the Rx/Tx bust APIs cannot receive and send packets, > +and the control plane API return failure. > + > +In some service scenarios, application needs to be aware of the event to > +determine whether to migrate services. So three events was introduced. > + > +The PMD must trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the > +application that it detected a hardware or firmware error and tries to > recover. > + > +The PMD must trigger RTE_ETH_EVENT_RECOVER_SUCCESS event to notify the > +application that it has recovered from the error. And PMD already > re-configures > +the port to the state prior to the error. > + > +The PMD must trigger RTE_ETH_EVENT_RECOVER_FAILED event to notify the > +application that it has failed to recover from the error. The port may not be > +usable anymore. > + > +.. note:: > + The error recovery of these events is mainly performed by the PMD. > + Unlike the RTE_ETH_EVENT_INTR_RESET which the error recovery is > + performed by the application. The PMD must ensure that the above two > + error handling methods cannot be used at the same time. > diff --git a/doc/guides/rel_notes/release_22_07.rst > b/doc/guides/rel_notes/release_22_07.rst > index 6fc044edaa..b237bd3303 100644 > --- a/doc/guides/rel_notes/release_22_07.rst > +++ b/doc/guides/rel_notes/release_22_07.rst > @@ -108,6 +108,17 @@ New Features > > Added an API which can get the device type of vDPA device. > > +* **Added error recover notification.** > + > + Added error recover notification to application including: > + > + * Added new event: ``RTE_ETH_EVENT_ERR_RECOVERING`` for the PMD to report > + that the port is recovering from an error. > + * Added new event: ``RTE_ETH_EVENT_RECOVER_SUCCESS`` for the PMD to report > + that the port recover successful from an error. RTE_ETH_EVENT_RECOVERY_SUCCESS > + * Added new event: ``RTE_ETH_EVENT_RECOVER_FAILED`` for the PMD to > report RTE_ETH_EVENT_RECOVERY_FAILED > + that the prot recover failed from an error to report that port recovery failed > + > * **Updated Amazon ena driver.** > > The new driver version (v2.7.0) includes: > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h > index 045ee64747..6998f6f0be 100644 > --- a/lib/ethdev/rte_ethdev.h > +++ b/lib/ethdev/rte_ethdev.h > @@ -3928,6 +3928,12 @@ enum rte_eth_event_type { > * @see rte_eth_rx_avail_thresh_set() > */ > RTE_ETH_EVENT_RX_AVAIL_THRESH, > + /** Port recovering from a hardware or firmware error */ > + RTE_ETH_EVENT_ERR_RECOVERING, > + /** Port recovers successful from the error */ > + RTE_ETH_EVENT_RECOVER_SUCCESS, RTE_ETH_EVENT_RECOVERY_SUCCESS > + /** Port recovers failed from the error */ > + RTE_ETH_EVENT_RECOVER_FAILED, RTE_ETH_EVENT_RECOVERY_FAILED > RTE_ETH_EVENT_MAX /**< max value of this enum */ > }; -- Regards, Ray K