Wed, Sep 26, 2018 at 01:52:58PM CEST, era...@mellanox.com wrote: >The exception spec is targeted for Real Time Alerting, in order to know when >something bad had happened to a PCI device >- Provide alert debug information >- Self healing >- If problem needs vendor support, provide a way to gather all needed debugging > information. > >The exception mechanism contains condition checkers which sense for >malfunction. Upon a condition hit, >actions such as logs and correction can be taken. > >The condition checkers are divided into the following groups >- Hardware - a checker which is triggered by the device due to > malfunction. >- Software - a checker which is triggered by the software due to > malfunction.
What do you mean by a "software malfunction", a "FW malfunction"? Also, I don't see this 2 groups in the man. >Both groups of condition checkers can be triggered due to error event or due >to a periodic check. > >Actions are the way to handle those events. Action can be in one of the >following groups: >- Dump - SW trace, SW dump, HW trace, HW dump >- Reset - Surgical correction (e.g. modify Q, flush Q, reset of device, etc) >Actions can be performed by SW or HW. > >User is allowed to enable or disable condition checkers and its action mapping. > >This RFC man page patch describes the suggested API of devlink-exception in >order >to control conditions and actions. > >V2: >* Renaming terms: > health -> exception > sensor -> condition >* Remove reinit command and merge with action command. >* Consmetics in grammer. > >Eran Ben Elisha (1): > man: Add devlink exception man page > > man/man8/devlink-exception.8 | 158 +++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 158 insertions(+) > create mode 100644 man/man8/devlink-exception.8 > >-- >1.8.3.1 >