On Thu, Aug 30, 2018 at 04:45:45PM +0200, Christian Brauner wrote: > On Thu, Aug 30, 2018 at 11:49:31AM +0300, Kirill Tkhai wrote: > > On 29.08.2018 21:13, Christian Brauner wrote: > > > Hi Kirill, > > > > > > Thanks for the question! > > > > > > On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote: > > >> Hi, Christian, > > >> > > >> On 29.08.2018 02:18, Christian Brauner wrote: > > >>> From: Christian Brauner <christ...@brauner.io> > > >>> > > >>> Hey, > > >>> > > >>> A while back we introduced and enabled IFLA_IF_NETNSID in > > >>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has > > >>> led > > >>> to signficant performance increases since it allows userspace to avoid > > >>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the > > >>> interfaces from the netns associated with the netns_fd. Especially when > > >>> a > > >>> lot of network namespaces are in use, using setns() becomes increasingly > > >>> problematic when performance matters. > > >> > > >> could you please give a real example, when setns()+socket(AF_NETLINK) > > >> cause > > >> problems with the performance? You should do this only once on > > >> application > > >> startup, and then you have created netlink sockets in any net namespaces > > >> you > > >> need. What is the problem here? > > > > > > So we have a daemon (LXD) that is often running thousands of containers. > > > When users issue a lxc list request against the daemon it returns a list > > > of all containers including all of the interfaces and addresses for each > > > container. To retrieve those addresses we currently rely on setns() + > > > getifaddrs() for each of those containers. That has horrible > > > performance. > > > > Could you please provide some numbers showing that setns() > > introduces signify performance decrease in the application? > > Sure, might take a few days++ though since I'm traveling.
Hey Kirill, As promised here's some code [1] that compares performance. I basically did a setns() to the network namespace and called getifaddrs() and compared this to the scenario where I use the newly introduced property. I did this 1 million times and calculated the mean getifaddrs() retrieval time based on that. My patch cuts the time in half. brauner@wittgenstein:~/netns_getifaddrs$ ./getifaddrs_perf 0 1178 Mean time in microseconds (netnsid): 81 Mean time in microseconds (setns): 162 Christian I'm only appending the main file since the netsns_getifaddrs() code I used is pretty long: [1]: #define _GNU_SOURCE #define __STDC_FORMAT_MACROS #include <fcntl.h> #include <inttypes.h> #include <linux/types.h> #include <sched.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/stat.h> #include <sys/time.h> #include <sys/types.h> #include <unistd.h> #include "netns_getifaddrs.h" #include "print_getifaddrs.h" #define ITERATIONS 1000000 #define SEC_TO_MICROSEC(x) (1000000 * (x)) int main(int argc, char *argv[]) { int i, ret; __s32 netns_id; pid_t netns_pid; char path[1024]; intmax_t times[ITERATIONS]; struct timeval t1, t2; intmax_t time_in_mcs; int fret = EXIT_FAILURE; intmax_t sum = 0; int host_netns_fd = -1, netns_fd = -1; struct ifaddrs *ifaddrs = NULL; if (argc != 3) goto on_error; netns_id = atoi(argv[1]); netns_pid = atoi(argv[2]); printf("%d\n", netns_id); printf("%d\n", netns_pid); for (i = 0; i < ITERATIONS; i++) { ret = gettimeofday(&t1, NULL); if (ret < 0) goto on_error; ret = netns_getifaddrs(&ifaddrs, netns_id); freeifaddrs(ifaddrs); if (ret < 0) goto on_error; ret = gettimeofday(&t2, NULL); if (ret < 0) goto on_error; time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) - (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec); times[i] = time_in_mcs; } for (i = 0; i < ITERATIONS; i++) sum += times[i]; printf("Mean time in microseconds (netnsid): %ju\n", sum / ITERATIONS); ret = snprintf(path, sizeof(path), "/proc/%d/ns/net", netns_pid); if (ret < 0 || (size_t)ret >= sizeof(path)) goto on_error; netns_fd = open(path, O_RDONLY | O_CLOEXEC); if (netns_fd < 0) goto on_error; host_netns_fd = open("/proc/self/ns/net", O_RDONLY | O_CLOEXEC); if (host_netns_fd < 0) goto on_error; memset(times, 0, sizeof(times)); for (i = 0; i < ITERATIONS; i++) { ret = gettimeofday(&t1, NULL); if (ret < 0) goto on_error; ret = setns(netns_fd, CLONE_NEWNET); if (ret < 0) goto on_error; ret = getifaddrs(&ifaddrs); freeifaddrs(ifaddrs); if (ret < 0) goto on_error; ret = gettimeofday(&t2, NULL); if (ret < 0) goto on_error; ret = setns(host_netns_fd, CLONE_NEWNET); if (ret < 0) goto on_error; time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) - (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec); times[i] = time_in_mcs; } for (i = 0; i < ITERATIONS; i++) sum += times[i]; printf("Mean time in microseconds (setns): %ju\n", sum / ITERATIONS); fret = EXIT_SUCCESS; on_error: if (netns_fd >= 0) close(netns_fd); if (host_netns_fd >= 0) close(host_netns_fd); exit(fret); } > > > > > I worry about all this just because of netlink interface is > > already overloaded, while this patch makes it even less modular. > > Introducing the IFA_IF_NETNSID property will not make the netlink > interface less modular. It is a clean, RTM_*ADDR-request specific > property using network namespace identifiers which we discussed in prior > patches are the way to go forward. > > You can already get interfaces via GETLINK from another network > namespaces than the one you reside in (Which we enabled just a few > months back.) but you can't do the same for GETADDR. Those two are > almost always used together. When you want to get the links you usually > also want to get the addresses associated with it right after. > In a prior discussion we agreed that network namespace identifiers are > the way to go forward but that any other propery, i.e. PIDs and fds > should never be ported into other parts of the codebase and that is > indeed something I agree with. > > > In case of one day we finally reach rtnl unscalability trap, > > every common interface like this may be a decisive nail in > > a coffin lid of possibility to overwrite everything. > > > > > The problem with what you're proposing is that the daemon would need to > > > cache a socket file descriptor for each container which is something > > > that we unfortunately cannot do since we can't excessively cache file > > > descriptors because we can easily hit the open file limit. We also > > > refrain from caching file descriptors for a long time for security > > > reasons. > > > > > > For the case where users just request a list of the interfaces we > > > can already use RTM_GETLINK + IFLA_IF_NETNS which has way better > > > performance. But we can't do the same with RTM_GETADDR requests which > > > was an oversight on my part when I wrote the original patchset for the > > > RTM_*LINK requests. This just rectifies this and aligns RTM_GETLINK + > > > RTM_GETADDR. > > > Based on this patchset I have written a userspace POC that is basically > > > a netns namespace aware getifaddr() or - as I like to call it - > > > netns_getifaddr(). > > > > > >> > > >>> Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf. > > >>> getifaddrs() style functions and friends). But currently, RTM_GETADDR > > >>> requests do not support a similar property like IFLA_IF_NETNSID for > > >>> RTM_*LINK requests. > > >>> This is problematic since userspace can retrieve interfaces from another > > >>> network namespace by sending a IFLA_IF_NETNSID property along but > > >>> RTM_GETLINK request but is still forced to use the legacy setns() style > > >>> of > > >>> retrieving interfaces in RTM_GETADDR requests. > > >>> > > >>> The goal of this series is to make it possible to perform RTM_GETADDR > > >>> requests on different network namespaces. To this end a new > > >>> IFA_IF_NETNSID > > >>> property for RTM_*ADDR requests is introduced. It can be used to send a > > >>> network namespace identifier along in RTM_*ADDR requests. The network > > >>> namespace identifier will be used to retrieve the target network > > >>> namespace > > >>> in which the request is supposed to be fulfilled. This aligns the > > >>> behavior > > >>> of RTM_*ADDR requests with the behavior of RTM_*LINK requests. > > >>> > > >>> Security: > > >>> - The caller must have assigned a valid network namespace identifier for > > >>> the target network namespace. > > >>> - The caller must have CAP_NET_ADMIN in the owning user namespace of the > > >>> target network namespace. > > >>> > > >>> Thanks! > > >>> Christian > > >>> > > >>> [1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID") > > >>> [2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for > > >>> RTM_NEWLINK") > > >>> [3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for > > >>> RTM_DELLINK") > > >>> [4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for > > >>> RTM_SETLINK") > > >>> [5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in > > >>> do_setlink()") > > >>> > > >>> Christian Brauner (5): > > >>> rtnetlink: add rtnl_get_net_ns_capable() > > >>> if_addr: add IFA_IF_NETNSID > > >>> ipv4: enable IFA_IF_NETNSID for RTM_GETADDR > > >>> ipv6: enable IFA_IF_NETNSID for RTM_GETADDR > > >>> rtnetlink: move type calculation out of loop > > >>> > > >>> include/net/rtnetlink.h | 1 + > > >>> include/uapi/linux/if_addr.h | 1 + > > >>> net/core/rtnetlink.c | 15 +++++--- > > >>> net/ipv4/devinet.c | 38 +++++++++++++++----- > > >>> net/ipv6/addrconf.c | 70 ++++++++++++++++++++++++++++-------- > > >>> 5 files changed, 97 insertions(+), 28 deletions(-) > > >>>