On Tue, Mar 13, 2018 at 8:39 PM, Alexei Starovoitov <a...@kernel.org> wrote: > For our container management we've been using complicated and fragile setup > consisting of LD_PRELOAD wrapper intercepting bind and connect calls from > all containerized applications. > The setup involves per-container IPs, policy, etc, so traditional > network-only solutions that involve VRFs, netns, acls are not applicable. You can keep the policies per cgroup but move the ip from cgroup to net-ns and then none of these ebpf hacks are required since cgroup and namespaces are orthogonal you can use cgroups in conjunction with namespaces.
> Changing apps is not possible and LD_PRELOAD doesn't work > for apps that don't use glibc like java and golang. > BPF+cgroup looks to be the best solution for this problem. > Hence we introduce 3 hooks: > - at entry into sys_bind and sys_connect > to let bpf prog look and modify 'struct sockaddr' provided > by user space and fail bind/connect when appropriate > - post sys_bind after port is allocated > > The approach works great and has zero overhead for anyone who doesn't > use it and very low overhead when deployed. > > The main question for Daniel and Dave is what approach to take > with prog types... > > In this patch set we introduce 6 new program types to make user > experience easier: > BPF_PROG_TYPE_CGROUP_INET4_BIND, > BPF_PROG_TYPE_CGROUP_INET6_BIND, > BPF_PROG_TYPE_CGROUP_INET4_CONNECT, > BPF_PROG_TYPE_CGROUP_INET6_CONNECT, > BPF_PROG_TYPE_CGROUP_INET4_POST_BIND, > BPF_PROG_TYPE_CGROUP_INET6_POST_BIND, > > since v4 programs should not be using 'struct bpf_sock_addr'->user_ip6 fields > and different prog type for v4 and v6 helps verifier reject such access > at load time. > Similarly bind vs connect are two different prog types too, > since only sys_connect programs can call new bpf_bind() helper. > > This approach is very different from tcp-bpf where single > 'struct bpf_sock_ops' and single prog type is used for different hooks. > The field checks are done at run-time instead of load time. > > I think the approach taken by this patch set is justified, > but we may do better if we extend BPF_PROG_ATTACH cmd > with log_buf + log_size, then we should be able to combine > bind+connect+v4+v6 into single program type. > The idea that at load time the verifier will remember a bitmask > of fields in bpf_sock_addr used by the program and helpers > that program used, then at attach time we can check that > hook is compatible with features used by the program and > report human readable error message back via log_buf. > We cannot do this right now with just EINVAL, since combinations > of errors like 'using user_ip6 field but attaching to v4 hook' > are too high to express as errno. > This would be bigger change. If you folks think it's worth it > we can go with this approach or if you think 6 new prog types > is not too bad, we can leave the patch as-is. > Comments? > Other comments on patches are welcome. > > Andrey Ignatov (6): > bpf: Hooks for sys_bind > selftests/bpf: Selftest for sys_bind hooks > net: Introduce __inet_bind() and __inet6_bind > bpf: Hooks for sys_connect > selftests/bpf: Selftest for sys_connect hooks > bpf: Post-hooks for sys_bind > > include/linux/bpf-cgroup.h | 68 +++- > include/linux/bpf_types.h | 6 + > include/linux/filter.h | 10 + > include/net/inet_common.h | 2 + > include/net/ipv6.h | 2 + > include/net/sock.h | 3 + > include/net/udp.h | 1 + > include/uapi/linux/bpf.h | 52 ++- > kernel/bpf/cgroup.c | 36 ++ > kernel/bpf/syscall.c | 42 ++ > kernel/bpf/verifier.c | 6 + > net/core/filter.c | 479 ++++++++++++++++++++++- > net/ipv4/af_inet.c | 60 ++- > net/ipv4/tcp_ipv4.c | 16 + > net/ipv4/udp.c | 14 + > net/ipv6/af_inet6.c | 47 ++- > net/ipv6/tcp_ipv6.c | 16 + > net/ipv6/udp.c | 20 + > tools/include/uapi/linux/bpf.h | 39 +- > tools/testing/selftests/bpf/Makefile | 8 +- > tools/testing/selftests/bpf/bpf_helpers.h | 2 + > tools/testing/selftests/bpf/connect4_prog.c | 45 +++ > tools/testing/selftests/bpf/connect6_prog.c | 61 +++ > tools/testing/selftests/bpf/test_sock_addr.c | 541 > ++++++++++++++++++++++++++ > tools/testing/selftests/bpf/test_sock_addr.sh | 57 +++ > 25 files changed, 1580 insertions(+), 53 deletions(-) > create mode 100644 tools/testing/selftests/bpf/connect4_prog.c > create mode 100644 tools/testing/selftests/bpf/connect6_prog.c > create mode 100644 tools/testing/selftests/bpf/test_sock_addr.c > create mode 100755 tools/testing/selftests/bpf/test_sock_addr.sh > > -- > 2.9.5 >