On 10/12/2020 9:09 AM, Andrew Rybchenko wrote:
On 10/12/20 12:29 AM, Thomas Monjalon wrote:
09/10/2020 05:48, Kalesh A P:
From: Kalesh AP <kalesh-anakkur.pura...@broadcom.com>
Adding support for device reset and recovery events in the
rte_eth_event framework. FW error and FW reset conditions would be
managed internally by PMD without needing application intervention.
In such cases, PMD would need reset/recovery events to notify application
that PMD is undergoing a reset.
Signed-off-by: Somnath Kotur <somnath.ko...@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.pura...@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khapa...@broadcom.com>
Reviewed-by: Asaf Penso <as...@nvidia.com>
The ethdev maintainers are not Cc'ed.
Please use the option --cc-cmd devtools/get-maintainer.sh
+Error recovery support
+~~~~~~~~~~~~~~~~~~~~~~
+
+When the PMD detects a FW reset or error condition, it will try to recover
+from the error without needing the application intervention. In such cases,
+PMD would need events to notify the application that it is undergoing
+an error recovery.
+
+The PMD will trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
+application that PMD detected a FW reset or FW error condition. PMD will
+try to recover from the error by itself. Data path will be halted and
+control path operations would fail during the recovery period.
+
+The PMD will trigger RTE_ETH_EVENT_RECOVERED event to notify the application
+that the it has recovered from the error condition. Control path and data path
+are up now. Since the device undergone a reset, flow rules offloaded prior to
+the reset will be lost and the application has to recreate the rules again.
What should be done if the state is not recoverable?
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 9759f13..9b4b015 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -3207,6 +3207,23 @@ enum rte_eth_event_type {
RTE_ETH_EVENT_DESTROY, /**< port is released */
RTE_ETH_EVENT_IPSEC, /**< IPsec offload related event */
RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
+ RTE_ETH_EVENT_ERR_RECOVERING,
+ /**< port recovering from an error
+ *
+ * PMD detected a FW reset or error condition.
+ * PMD will try to recover from the error.
+ * Data path will be halted and Control path operations
+ * would fail at this time.
+ */
Does it mean the application has nothing to do when receiving this event?
I think the app should stop polling at least.
+ RTE_ETH_EVENT_RECOVERED,
+ /**< port recovered from an error
+ *
+ * PMD has recovered from the error condition.
+ * Control path and Data path are up now.
+ * Since the device undergone a reset, flow rules
+ * offloaded prior to the reset will be lost and
+ * the application has to recreate the rules again.
+ */
Please be more precise.
Should the app re-configure the port, setup the queues, start the port?
Hi Kalesh Anakkur,
The mechanics of notifying the application looks good, but the concerns seems
more about what application should do with this information.
PMD notifies the application on the FW/HW reset and pushes some
tasks/responsibilities to the application, but for this to be useful, these
tasks should be clear to application.
Think yourself in a situation that you are developing an application and you
received these events from a device that you don't know its internals, what will
you do?
Both Thomas and Andrew put cases that needs more clarification for application.
Can you please send a new version with those clarifications?
Thanks,
ferruh