On 8/18/20 6:24 PM, Jakub Kicinski wrote: > On Mon, 17 Aug 2020 15:50:53 +0300 Ido Schimmel wrote: >> From: Ido Schimmel <ido...@nvidia.com> >> >> This patch set extends devlink to allow device drivers to expose device >> metrics to user space in a standard and extensible fashion, as opposed >> to the driver-specific debugfs approach. > > I feel like all those loose hardware interfaces are a huge maintenance > burden. I don't know what the solution is, but the status quo is not > great.
I don't agree with the 'loose' characterization. Ido and team are pushing what is arguably a modern version of `ethtool -S`, so it provides a better API for retrieving data. > > I spend way too much time patrolling ethtool -S outputs already. > But that's the nature of detailed stats which are often essential to ensuring the system is operating as expected or debugging some problem. Commonality is certainly desired in names when relevant to be able to build tooling around the stats. As an example, per-queue stats have been essential to me for recent investigations. ethq has been really helpful in crossing NIC vendors and viewing those stats as it handles the per-vendor naming differences, but it requires changes to show anything else - errors per queue, xdp stats, drops, etc. This part could be simpler. As for this set, I believe the metrics exposed here are more unique to switch ASICs. At least one company I know of has built a business model around exposing detailed telemetry of switch ASICs, so clearly some find them quite valuable.