On 9/13/2018 3:08 PM, Andrew Lunn wrote:
devlink health sensor set pci/0000:01:00.0 name TX_COMP_ERROR action
reset off action dump on
Sets TX_COMP_ERROR sensor parameters for a specific device.
I hope the real sensors have more understandable names. If i remember
correctly, the same sort of comment was given for resource
management. It was pretty unclear what the resource names actually
mean. Is an average user going to have any idea how to actually use
these sensors and actions?
well, hopefully. the whole point is to have it fully controlled by the
user. However, names for the command should be short. I guess we shall
have it documented (challenge is to fit to multi vendors).
Can you give more examples of sensors. We should understand if there
are any overlaps with hwmon.
I restate here that we shall have SW sensors as well, and not only HW
sensors.
This is what I had in mind:
1. command interface error
2. command interface timeout
3. stuck TX queue (like tx_timeout)
4. stuck TX completion queue (driver did not process packets in a
reasonable time period)
5. stuck RX queue
6. RX completion error
7. TX completion error
8. HW / FW catastrophic error report
9. completion queue overrun
Eran
Andrew