On Thu, Jan 31, 2019 at 8:16 AM Roopa Prabhu <ro...@cumulusnetworks.com> wrote: > > On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski > <jakub.kicin...@netronome.com> wrote: > > > > On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote: > > > On Mon, Jan 28, 2019 at 3:45 PM Jakub Kicinski wrote: > > > > Hi! > > > > > > > > As I tried to explain in my slides at netconf 2018 we are lacking > > > > an expressive, standard API to report device statistics. > > > > > > > > Networking silicon generally maintains some IEEE 802.3 and/or RMON > > > > statistics. Today those all end up in ethtool -S. Here is a simple > > > > attempt (admittedly very imprecise) of counting how many names driver > > > > authors invented for IETF RFC2819 etherStatsPkts512to1023Octets > > > > statistics (RX and TX): > > > > > > > > $ git grep '".*512.*1023.*"' -- drivers/net/ | \ > > > > sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l > > > > 63 > > > > > > > > Interestingly only two drivers in the tree use the name the standard > > > > gave us (etherStatsPkts512to1023, modulo case). > > > > > > > > I set out to working on this set in an attempt to give drivers a way > > > > to express clearly to user space standard-compliant counters. > > > > > > > > Second most common use for custom statistics is per-queue counters. > > > > This is where the "hierarchical" part of this set comes in, as > > > > groups can be nested, and user space tools can handle the aggregation > > > > inside the groups if needed. > > > > > > > > This set also tries to address the problem of users not knowing if > > > > a statistic is reported by hardware or the driver. Many modern drivers > > > > use some prefix in ethtool -S to indicate MAC/PHY stats. At a quick > > > > glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy". > > > > In this set, netlink attributes describe whether a group of statistics > > > > is RX or TX, maintained by device or driver. > > > > > > > > The purpose of this patch set is _not_ to replace ethtool -S. It is > > > > an incredibly useful tool, and we will certainly continue using it. > > > > However, for standard-based and commonly maintained statistics a more > > > > structured API seems warranted. > > > > > > > > There are two things missing from these patches, which I initially > > > > planned to address as well: filtering, and refresh rate control. > > > > > > > > Filtering doesn't need much explanation, users should be able to request > > > > only a subset of statistics (like only SW stats or only given ID). The > > > > bitmap of statistics in each group is there for filtering later on. > > > > > > > > By refresh control I mean the ability for user space to indicate how > > > > "fresh" values it expects. Sometimes reading the HW counters requires > > > > slow register reads or FW communication, in such cases drivers may cache > > > > the result. (Privileged) user space should be able to add a "not older > > > > than" timestamp to indicate how fresh statistics it expects. And vice > > > > versa, drivers can then also put the timestamp of when the statistics > > > > were last refreshed in the dump for more precise bandwidth estimation. > > > > > > Jakub, Glad to see hw stats in the RTM_*STATS api. I do see you > > > mention 'partial' support for ethtool stats. I understand the reason > > > you say its partial. > > > But while at it, why not also include the ability to have driver > > > extensible stats here ? ie make it complete. We have talked about > > > making all hw stats available > > > via the RTM_*STATS api in the past..., so just want to make sure the > > > new HSTATS infra you are adding to the RTM_*STATS api > > > covers or at-least makes it possible to include driver extensible > > > stats in the future where the driver gets to define the stats id + > > > value (This is very useful). > > > It would be nice if you can account for that in this new HSTATS API. > > > > My thinking was that we should leave truly custom/strange stats to > > ethtool API which works quite well for that and at the same time be > > very accepting of people adding new IDs to HSTAT (only requirement is > > basically defining the meaning very clearly). > > that sounds reasonable. But the 'defining meaning clearly' gets tricky > sometimes. > The vendor who gets their ID or meaning first wins :) and the rest > will have to live with > ethtool and explain to rest of the world that ethtool is more reliable > for their hardware :) > > I am also concerned that this getting the ID into common HSTAT ID > space will slow down the process of adding new counters > for vendors. Which will lead to vendors sticking with ethtool API. It > would be great if people can get all stats in one place and not rely > on another API for 'more'. > > > > > For the first stab I looked at two drivers and added all the stats that > > were common. > > > > Given this set is identifying statistics by ID - how would we make that > > extensible to drivers? Would we go back to strings or have some > > "driver specific" ID space? > > I was looking for ideas from you really, to see if you had considered > this. agree per driver ID space seems ugly. > ethtool strings are great today...if we can control the duplication. > But thinking some more..., i did see some > patches recently for vendor specific parameter (with ID) space in > devlink. maybe something like that will be > reasonable ? > > > > > Is there any particular type of statistic you'd expect drivers to want > > to add? For NICs I think IEEE/RMON should pretty much cover the > > silicon ones, but I don't know much about switches :) > > I will have to go through the list. But switch asics do support > flexible stats/counters that can be attached at various points. > And new chip versions come with more support. Having that flexibility > to expose/extend such stats incrementally is very valuable on a per > hardware/vendor basis.
Just want to clarify that I am suggesting a nested HSTATS extension infra for drivers (just like ethtool). 'Common stats' stays at the top-level.