On Fri, Mar 15, 2019 at 10:56 AM Petr Machata <pe...@mellanox.com> wrote: > > In general, after a port is put administratively up, certain handshake > protocols have to finish successfully, otherwise the port is left in a > NO-CARRIER or DORMANT state. When that happens, it would be useful to > communicate the reasons to the administrator, so that the problem that > prevents the link from being established can be corrected. > > Introduce two new link attributes: IFLA_LINK_DOWN_REASON_MAJOR and > _MINOR. Major reason code serve as broad categories intended to convey a > general idea of where the problem is. Minor codes are arbitrary numbers > specific for the driver that add detail to the major reasons. Add enum > rtnl_link_down_reason_major to define the well-known major reason codes. > > The party with visibility into details of this process is the driver. > Therefore add two new RTNL hooks, link_down_reason_get_size and > fill_link_down_reason, to provide the necessary information. > > Link-down reason is not included if the port is up or administratively > down, because those two state are easy to discover through existing > interfaces. > > Signed-off-by: Petr Machata <pe...@mellanox.com>
This looks good and will be use-full. But i have some comments on the implementation below. Also, carrier can go down due to protocol down (IFLA_PROTODOWN). And I get asked about supporting reason codes or protocol owner for such protodown reason (I have not given it much thought yet. I will see if there is a way to use your series for that as well). > --- > include/net/rtnetlink.h | 3 +++ > include/uapi/linux/if_link.h | 16 ++++++++++++++++ > net/core/rtnetlink.c | 22 ++++++++++++++++++++++ > 3 files changed, 41 insertions(+) > > diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h > index e2091bb2b3a8..cfd9e86ff0ca 100644 > --- a/include/net/rtnetlink.h > +++ b/include/net/rtnetlink.h > @@ -110,6 +110,9 @@ struct rtnl_link_ops { > int (*fill_linkxstats)(struct sk_buff *skb, > const struct net_device > *dev, > int *prividx, int attr); > + size_t (*link_down_reason_get_size)(const struct > net_device *dev); > + int (*fill_link_down_reason)(struct sk_buff *skb, > + const struct > net_device *dev); > }; Any reason to restrict this to network interfaces which support rtnl_link_ops ?. I also saw that you added rtnl_link_ops to mlxsw for this. Which also means every driver will now have to declare rtnl_link_ops to use this ?. I think we should keep rtnl_link_ops to logical links like bridge, bonds etc (ie which support newlink and dellink). Can't we use netdev_ops for this ?. That will allow any driver to just support this readily. > > int __rtnl_link_register(struct rtnl_link_ops *ops); > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > index 5b225ff63b48..a47f42e79741 100644 > --- a/include/uapi/linux/if_link.h > +++ b/include/uapi/linux/if_link.h > @@ -167,6 +167,8 @@ enum { > IFLA_NEW_IFINDEX, > IFLA_MIN_MTU, > IFLA_MAX_MTU, > + IFLA_LINK_DOWN_REASON_MAJOR, /* enum rtnl_link_down_reason_major */ > + IFLA_LINK_DOWN_REASON_MINOR, > __IFLA_MAX > }; > > @@ -239,6 +241,20 @@ enum in6_addr_gen_mode { > IN6_ADDR_GEN_MODE_RANDOM, > }; > > +enum rtnl_link_down_reason_major { > + RTNL_LDR_OTHER, > + RTNL_LDR_NO_CABLE, > + RTNL_LDR_UNSUPPORTED_CABLE, > + RTNL_LDR_AUTONEG_FAILURE, > + RTNL_LDR_NO_LINK_PARTNER, > + RTNL_LDR_LINK_TRAINING_FAILURE, > + RTNL_LDR_LOGICAL_MISMATCH, > + RTNL_LDR_REMOTE_FAULT, > + RTNL_LDR_BAD_SIGNAL_INTEGRITY, > + RTNL_LDR_CALIBRATION_FAILURE, > + RTNL_LDR_POWER_BUDGET_EXCEEDED, > +}; prefer LINK_DOWN_REASON_* or LINKDOWN_REASON_* (Though there is no predefined convention, the prefix RTNL makes it feel like a top-level attribute when its really a value for an IFLA_* attribute.) > + > /* Bridge section */ > > enum { > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > index a51cab95ba64..206795f13850 100644 > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -979,6 +979,22 @@ static size_t rtnl_xdp_size(void) > return xdp_size; > } > > +static bool rtnl_should_fill_link_down_reason(const struct net_device *dev) > +{ > + return (dev->flags & IFF_UP) && !netif_oper_up(dev) && > + dev->rtnl_link_ops && > + dev->rtnl_link_ops->link_down_reason_get_size && > + dev->rtnl_link_ops->fill_link_down_reason; > +} > + > +static size_t rtnl_link_down_reason_get_size(const struct net_device *dev) > +{ > + if (!rtnl_should_fill_link_down_reason(dev)) > + return 0; > + > + return dev->rtnl_link_ops->link_down_reason_get_size(dev); > +} > + > static noinline size_t if_nlmsg_size(const struct net_device *dev, > u32 ext_filter_mask) > { > @@ -1026,6 +1042,7 @@ static noinline size_t if_nlmsg_size(const struct > net_device *dev, > + nla_total_size(4) /* IFLA_CARRIER_DOWN_COUNT */ > + nla_total_size(4) /* IFLA_MIN_MTU */ > + nla_total_size(4) /* IFLA_MAX_MTU */ > + + rtnl_link_down_reason_get_size(dev) > + 0; > } > > @@ -1683,6 +1700,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, > nla_put_s32(skb, IFLA_NEW_IFINDEX, new_ifindex) < 0) > goto nla_put_failure; > > + if (rtnl_should_fill_link_down_reason(dev) && > + dev->rtnl_link_ops->fill_link_down_reason(skb, dev)) > + goto nla_put_failure; > > rcu_read_lock(); > if (rtnl_fill_link_af(skb, dev, ext_filter_mask)) > @@ -1742,6 +1762,8 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] > = { > [IFLA_CARRIER_DOWN_COUNT] = { .type = NLA_U32 }, > [IFLA_MIN_MTU] = { .type = NLA_U32 }, > [IFLA_MAX_MTU] = { .type = NLA_U32 }, > + [IFLA_LINK_DOWN_REASON_MAJOR] = { .type = NLA_U32 }, > + [IFLA_LINK_DOWN_REASON_MINOR] = { .type = NLA_U32 }, > }; > > static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = { > -- > 2.4.11 >