Re: [RFC] devlink: health: add remediation type

2021-03-09 Thread Jacob Keller
On 3/9/2021 2:52 PM, Jakub Kicinski wrote: > On Tue, 9 Mar 2021 16:18:58 +0200 Eran Ben Elisha wrote: DLH_REMEDY_LOCAL_FIX: associated component will undergo a local un-harmful fix attempt. (e.g look for lost interrupt in mlx5e_tx_reporter_timeout_recover()) >>> >>> Should we make

Re: [RFC] devlink: health: add remediation type

2021-03-09 Thread Jakub Kicinski
On Tue, 9 Mar 2021 16:18:58 +0200 Eran Ben Elisha wrote: > >> DLH_REMEDY_LOCAL_FIX: associated component will undergo a local > >> un-harmful fix attempt. > >> (e.g look for lost interrupt in mlx5e_tx_reporter_timeout_recover()) > > > > Should we make it more specific? Maybe DLH_REMEDY_STALL: de

Re: [RFC] devlink: health: add remediation type

2021-03-09 Thread Jakub Kicinski
On Tue, 9 Mar 2021 16:06:49 +0200 Eran Ben Elisha wrote: > On 3/8/2021 7:59 PM, Jakub Kicinski wrote: > >> Hm, export and extend devlink_health_reporter_state? I like that idea. > > > > Trying to type it up it looks less pretty than expected. > > > > Let's looks at some examples. > > > > A que

Re: [RFC] devlink: health: add remediation type

2021-03-09 Thread Eran Ben Elisha
On 3/8/2021 7:16 PM, Jakub Kicinski wrote: On Sun, 7 Mar 2021 17:59:58 +0200 Eran Ben Elisha wrote: On 3/6/2021 4:42 AM, Jakub Kicinski wrote: Currently devlink health does not give user any clear information of what kind of remediation ->recover callback will perform. This makes it difficul

Re: [RFC] devlink: health: add remediation type

2021-03-09 Thread Eran Ben Elisha
On 3/8/2021 7:59 PM, Jakub Kicinski wrote: On Mon, 8 Mar 2021 09:16:00 -0800 Jakub Kicinski wrote: + DLH_REMEDY_BAD_PART, BAD_PART probably indicates that the reporter (or any command line execution) cannot recover the issue. As the suggested remedy is static per reporter's recover met

Re: [RFC] devlink: health: add remediation type

2021-03-08 Thread Jakub Kicinski
On Mon, 8 Mar 2021 09:16:00 -0800 Jakub Kicinski wrote: > > > + DLH_REMEDY_BAD_PART, > > BAD_PART probably indicates that the reporter (or any command line > > execution) cannot recover the issue. > > As the suggested remedy is static per reporter's recover method, it > > doesn't make sense f

Re: [RFC] devlink: health: add remediation type

2021-03-08 Thread Jakub Kicinski
On Sun, 7 Mar 2021 17:59:58 +0200 Eran Ben Elisha wrote: > On 3/6/2021 4:42 AM, Jakub Kicinski wrote: > > Currently devlink health does not give user any clear information > > of what kind of remediation ->recover callback will perform. This > > makes it difficult to understand the impact of enabli

Re: [RFC] devlink: health: add remediation type

2021-03-07 Thread Eran Ben Elisha
On 3/6/2021 4:42 AM, Jakub Kicinski wrote: Currently devlink health does not give user any clear information of what kind of remediation ->recover callback will perform. This makes it difficult to understand the impact of enabling auto- -remediation, and the severity of the error itself. To a

Re: [RFC] devlink: health: add remediation type

2021-03-06 Thread Jakub Kicinski
On Sat, 6 Mar 2021 15:48:11 +0100 Andrew Lunn wrote: > +/** > > + * enum devlink_health_reporter_remedy - severity of remediation procedure > > + * @DLH_REMEDY_NONE: transient error, no remediation required > > + * @DLH_REMEDY_COMP_RESET: associated device component (e.g. device queue) > > + *

Re: [RFC] devlink: health: add remediation type

2021-03-06 Thread Andrew Lunn
+/** > + * enum devlink_health_reporter_remedy - severity of remediation procedure > + * @DLH_REMEDY_NONE: transient error, no remediation required > + * @DLH_REMEDY_COMP_RESET: associated device component (e.g. device queue) > + * will be reset > + * @DLH_REMEDY_RESET: full devic

[RFC] devlink: health: add remediation type

2021-03-05 Thread Jakub Kicinski
Currently devlink health does not give user any clear information of what kind of remediation ->recover callback will perform. This makes it difficult to understand the impact of enabling auto- -remediation, and the severity of the error itself. To allow users to make more informed decision, as we