Christian Brauner <christian.brau...@canonical.com> writes: > On Wed, Apr 04, 2018 at 09:48:57PM +0200, Christian Brauner wrote: >> commit 07e98962fa77 ("kobject: Send hotplug events in all network >> namespaces") >> >> enabled sending hotplug events into all network namespaces back in 2010. >> Over time the set of uevents that get sent into all network namespaces has >> shrunk. We have now reached the point where hotplug events for all devices >> that carry a namespace tag are filtered according to that namespace. >> >> Specifically, they are filtered whenever the namespace tag of the kobject >> does not match the namespace tag of the netlink socket. One example are >> network devices. Uevents for network devices only show up in the network >> namespaces these devices are moved to or created in. >> >> However, any uevent for a kobject that does not have a namespace tag >> associated with it will not be filtered and we will *try* to broadcast it >> into all network namespaces. >> >> The original patchset was written in 2010 before user namespaces were a >> thing. With the introduction of user namespaces sending out uevents became >> partially isolated as they were filtered by user namespaces: >> >> net/netlink/af_netlink.c:do_one_broadcast() >> >> if (!net_eq(sock_net(sk), p->net)) { >> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID)) >> return; >> >> if (!peernet_has_id(sock_net(sk), p->net)) >> return; >> >> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns, >> CAP_NET_BROADCAST)) >> j return; >> } >> >> The file_ns_capable() check will check whether the caller had >> CAP_NET_BROADCAST at the time of opening the netlink socket in the user >> namespace of interest. This check is fine in general but seems insufficient >> to me when paired with uevents. The reason is that devices always belong to >> the initial user namespace so uevents for kobjects that do not carry a >> namespace tag should never be sent into another user namespace. This has >> been the intention all along. But there's one case where this breaks, >> namely if a new user namespace is created by root on the host and an >> identity mapping is established between root on the host and root in the >> new user namespace. Here's a reproducer: >> >> sudo unshare -U --map-root >> udevadm monitor -k >> # Now change to initial user namespace and e.g. do >> modprobe kvm >> # or >> rmmod kvm >> >> will allow the non-initial user namespace to retrieve all uevents from the >> host. This seems very anecdotal given that in the general case user >> namespaces do not see any uevents and also can't really do anything useful >> with them. >> >> Additionally, it is now possible to send uevents from userspace. As such we >> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user >> namespace of the network namespace of the netlink socket) userspace process >> make a decision what uevents should be sent. >> >> This makes me think that we should simply ensure that uevents for kobjects >> that do not carry a namespace tag are *always* filtered by user namespace >> in kobj_bcast_filter(). Specifically: >> - If the owning user namespace of the uevent socket is not init_user_ns the >> event will always be filtered. >> - If the network namespace the uevent socket belongs to was created in the >> initial user namespace but was opened from a non-initial user namespace >> the event will be filtered as well. >> Put another way, uevents for kobjects not carrying a namespace tag are now >> always only sent to the initial user namespace. The regression potential >> for this is near to non-existent since user namespaces can't really do >> anything with interesting devices. >> >> Signed-off-by: Christian Brauner <christian.brau...@ubuntu.com> > > That was supposed to be [PATCH net] not [PATCH net-next] which is > obviously closed. Sorry about that.
This does not appear to be a fix. This looks like feature work. The motivation appears to be that looks wrong let's change it. So let's please leave this for when net-next opens again so we can have time to fully consider a change in semantics. Thank you, Eric