On Sun, Oct 07, 2018 at 08:16:21PM -0700, David Ahern wrote: > From: David Ahern <dsah...@gmail.com> > > There are many use cases where a user wants to influence what is > returned in a dump for some rtnetlink command: one is wanting data > for a different namespace than the one the request is received and > another is limiting the amount of data returned in the dump to a > specific set of interest to userspace, reducing the cpu overhead of > both kernel and userspace. Unfortunately, the kernel has historically > not been strict with checking for the proper header or checking the > values passed in the header. This lenient implementation has allowed > iproute2 and other packages to pass any struct or data in the dump > request as long as the family is the first byte. For example, ifinfomsg > struct is used by iproute2 for all generic dump requests - links, > addresses, routes and rules when it is really only valid for link > requests. > > There is 1 is example where the kernel deals with the wrong struct: link > dumps after VF support was added. Older iproute2 was sending rtgenmsg as > the header instead of ifinfomsg so a patch was added to try and detect > old userspace vs new: > e5eca6d41f53 ("rtnetlink: fix userspace API breakage for iproute2 < v3.9.0") > > The latest example is Christian's patch set wanting to return addresses for > a target namespace. It guesses the header struct is an ifaddrmsg and if it > guesses wrong a netlink warning is generated in the kernel log on every > address dump which is unacceptable. > > Another example where the kernel is a bit lenient is route dumps: iproute2 > can send either a request with either ifinfomsg or a rtmsg as the header > struct, yet the kernel always treats the header as an rtmsg (see > inet_dump_fib and rtm_flags check). The header inconsistency impacts the > ability to add kernel side filters for route dumps - a necessary feature > for scale setups with 100k+ routes. > > How to resolve the problem of not breaking old userspace yet be able to > move forward with new features such as kernel side filtering which are > crucial for efficient operation at high scale? > > This patch set addresses the problem by adding a new socket flag, > NETLINK_DUMP_STRICT_CHK, that userspace can use with setsockopt to > request strict checking of headers and attributes on dump requests and > hence unlock the ability to use kernel side filters as they are added. > > Kernel side, the dump handlers are updated to verify the message contains > at least the expected header struct: > RTM_GETLINK: ifinfomsg > RTM_GETADDR: ifaddrmsg > RTM_GETMULTICAST: ifaddrmsg > RTM_GETANYCAST: ifaddrmsg > RTM_GETADDRLABEL: ifaddrlblmsg > RTM_GETROUTE: rtmsg > RTM_GETSTATS: if_stats_msg > RTM_GETNEIGH: ndmsg > RTM_GETNEIGHTBL: ndtmsg > RTM_GETNSID: rtgenmsg > RTM_GETRULE: fib_rule_hdr > RTM_GETNETCONF: netconfmsg > RTM_GETMDB: br_port_msg > > And then every field in the header struct should be 0 with the exception > of the family. There are a few exceptions to this rule where the kernel > already influences the data returned by values in the struct. Next the > message should not contain attributes unless the kernel implements > filtering for it. Any unexpected data causes the dump to fail with EINVAL. > If the new flag is honored by the kernel and the dump contents adjusted > by any data passed in the request, the dump handler can set the > NLM_F_DUMP_FILTERED flag in the netlink message header. > > For old userspace on new kernel there is no impact as all checks are > wrapped in a check on the new strict flag. For new userspace on old > kernel, the data in the headers and any appended attributes are > silently ignored though the setsockopt failing is the clue to userspace > the feature is not supported. New userspace on new kernel gets the > requested data dump. > > iproute2 patches can be found here: > https://github.com/dsahern/iproute2 dump-enhancements > > Major changes since v1 > - inner header is supposed to be 4-bytes aligned. So for dumps that > should not have attributes appended changed the check to use: > if (nlmsg_attrlen(nlh, sizeof(hdr))) > Only impacts patches with headers that are not multiples of 4-bytes > (rtgenmsg, netconfmsg), but applied the change to all patches not > calling nlmsg_parse for consistency. > > - Added nlmsg_parse_strict and nla_parse_strict for tighter control on > attribute parsing. There should be no unknown attribute types or extra > bytes. > > - Moved validation to a helper in most cases > > Changes since rfc-v2 > - dropped the NLM_F_DUMP_FILTERED flag from target nsid dumps per > Jiri's objections > - changed the opt-in uapi from a netlink message flag to a socket > flag. setsockopt provides an api for userspace to definitively > know if the kernel supports strict checking on dumps. > - re-ordered patches to peel off the extack on dumps if needed to > keep this set size within limits > - misc cleanups in patches based on testing > > David Ahern (23): > netlink: Pass extack to dump handlers > netlink: Add extack message to nlmsg_parse for invalid header length > net: Add extack to nlmsg_parse > netlink: Add strict version of nlmsg_parse and nla_parse > net/ipv6: Refactor address dump to push inet6_fill_args to > in6_dump_addrs > netlink: Add new socket option to enable strict checking on dumps > net/ipv4: Update inet_dump_ifaddr for strict data checking > net/ipv6: Update inet6_dump_addr for strict data checking > rtnetlink: Update rtnl_dump_ifinfo for strict data checking > rtnetlink: Update rtnl_bridge_getlink for strict data checking > rtnetlink: Update rtnl_stats_dump for strict data checking > rtnetlink: Update inet6_dump_ifinfo for strict data checking > rtnetlink: Update ipmr_rtm_dumplink for strict data checking > rtnetlink: Update fib dumps for strict data checking > net/neighbor: Update neigh_dump_info for strict data checking > net/neighbor: Update neightbl_dump_info for strict data checking > net/namespace: Update rtnl_net_dumpid for strict data checking > net/fib_rules: Update fib_nl_dumprule for strict data checking > net/ipv6: Update ip6addrlbl_dump for strict data checking > net: Update netconf dump handlers for strict data checking > net/bridge: Update br_mdb_dump for strict data checking > rtnetlink: Move input checking for rtnl_fdb_dump to helper > rtnetlink: Update rtnl_fdb_dump for strict data checking
At this point it's all nits so it's got my ACK but keener eyes than mine might see other issues. Acked-by: Christian Brauner <christ...@brauner.io> > > include/linux/netlink.h | 2 + > include/net/ip_fib.h | 2 + > include/net/netlink.h | 21 ++- > include/uapi/linux/netlink.h | 1 + > lib/nlattr.c | 48 +++++-- > net/bridge/br_mdb.c | 30 ++++ > net/core/devlink.c | 2 +- > net/core/fib_rules.c | 36 ++++- > net/core/neighbour.c | 119 ++++++++++++--- > net/core/net_namespace.c | 6 + > net/core/rtnetlink.c | 318 > ++++++++++++++++++++++++++++++++--------- > net/ipv4/devinet.c | 101 ++++++++++--- > net/ipv4/fib_frontend.c | 42 +++++- > net/ipv4/ipmr.c | 39 +++++ > net/ipv6/addrconf.c | 177 ++++++++++++++++++----- > net/ipv6/addrlabel.c | 34 ++++- > net/ipv6/ip6_fib.c | 8 ++ > net/ipv6/ip6mr.c | 9 ++ > net/ipv6/route.c | 2 +- > net/mpls/af_mpls.c | 28 +++- > net/netfilter/ipvs/ip_vs_ctl.c | 2 +- > net/netlink/af_netlink.c | 33 ++++- > net/netlink/af_netlink.h | 1 + > net/sched/act_api.c | 2 +- > net/sched/cls_api.c | 6 +- > net/sched/sch_api.c | 2 +- > net/xfrm/xfrm_user.c | 2 +- > 27 files changed, 908 insertions(+), 165 deletions(-) > > -- > 2.11.0 >