On Wed, May 1, 2019 at 6:11 AM Eran Ben Elisha <era...@mellanox.com> wrote: > > > > On 4/30/2019 9:50 PM, Cong Wang wrote: > > Although devlink health report does a nice job on reporting TX > > timeout and other NIC errors, unfortunately it requires drivers > > to support it but currently only mlx5 has implemented it. > > The devlink health was never intended to be the generic mechanism for > monitoring all driver's TX timeouts notifications. mlx5e driver chose to > handle TX timeout notification by reporting it to the newly devlink > health mechanism.
Understood. > > > Before other drivers could catch up, it is useful to have a > > generic tracepoint to monitor this kind of TX timeout. We have > > been suffering TX timeout with different drivers, we plan to > > start to monitor it with rasdaemon which just needs a new tracepoint. > > Great idea to suggest a generic trace message that can be monitored over > all drivers. > > > > > Sample output: > > > > ksoftirqd/1-16 [001] ..s2 144.043173: net_dev_xmit_timeout: > > dev=ens3 driver=e1000 queue=0 > > > > Cc: Eran Ben Elisha <era...@mellanox.com> > > Cc: Jiri Pirko <j...@mellanox.com> > > Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com> > > --- > > include/trace/events/net.h | 23 +++++++++++++++++++++++ > > net/sched/sch_generic.c | 2 ++ > > 2 files changed, 25 insertions(+) > > > > diff --git a/include/trace/events/net.h b/include/trace/events/net.h > > index 1efd7d9b25fe..002d6f04b9e5 100644 > > --- a/include/trace/events/net.h > > +++ b/include/trace/events/net.h > > @@ -303,6 +303,29 @@ DEFINE_EVENT(net_dev_rx_exit_template, > > netif_receive_skb_list_exit, > > TP_ARGS(ret) > > ); > > > > I would have put this next to net_dev_xmit trace event declaration. > Sounds reasonable, it would be slightly easier to find it. I will send v2. Thanks.