On Thu, Aug 04, 2016 at 11:45:08AM +0200, Daniel Borkmann wrote: > Hi Sargun, > > On 08/04/2016 09:11 AM, Sargun Dhillon wrote: > [...] > >[It's a] minor LSM. My particular use case is one in which containers are > >being > >dynamically deployed to machines by internal developers in a different group. > [...] > >For many of these containers, the security policies can be fairly nuanced. > >One > >particular one to take into account is network security. Often times, > >administrators want to prevent ingress, and egress connectivity except from a > >few select IPs. Egress filtering can be managed using net_cls, but without > >modifying running software, it's non-trivial to attach a filter to all > >sockets > >being created within a container. The inet_conn_request, socket_recvmsg, > >socket_sock_rcv_skb hooks make this trivial to implement. > > I'm not too familiar with LSMs, but afaik, when you install such policies they > are effectively global, right? How would you install/manage such policies per > container? > > On a quick glance, this would then be the job of the BPF proglet from the > global > hook, no? If yes, then the BPF contexts the BPF prog works with seem rather > quite > limited ... You're right. They are global hooks. If you'd want the policy to be specific to a given cgroup, or namespace, you'd have to introduce a level of indirection through a prog_array, or some such. There are still cases (the CVE, and Docker bind) case where you want global isolation. The other big aspect is being able to implement application-specific LSMs without requiring kmods. (A la hardchroot).
> > +struct checmate_file_open_ctx { > + struct file *file; > + const struct cred *cred; > +}; > + > +struct checmate_task_create_ctx { > + unsigned long clone_flags; > +}; > + > +struct checmate_task_free_ctx { > + struct task_struct *task; > +}; > + > +struct checmate_socket_connect_ctx { > + struct socket *sock; > + struct sockaddr *address; > + int addrlen; > +}; > > ... or are you using bpf_probe_read() in some way to walk 'current' to > retrieve > a namespace from there somehow? Like via nsproxy? But how you make sense of > this > for defining a per container policy? In my prototype code, I'm using uts namespace + hostname, and I'm extracting that via the bpf_probe_read walk. You're right, that's less than awesome. In the longer-term, I'd hope we'd be able to add a helper like bpf_current_in_cgroup (a la bpf_skb_in_cgroup). The idea is that we'd add enough helpers to avoid this. I can submit some more example BPF programs if that'd help. Off the top of my head: * current_in_cgroup * introduce struct pid map * introduce helpers to inspect common datatypes passed to the helper -- if you look at something like the the net hooks, there aren't actually that many datatypes being passed around * Introduce an example top-level cgroup that maps cgroup -> tail_call into other programs > > Do you see a way where we don't need to define so many different ctx each > time? > > My other concern from a security PoV is that when using things like > bpf_probe_read() > we're dependent on kernel structs and there's a risk that when people migrate > such > policies that expectations break due to underlying structs changed. I see > you've > addressed that in patch 4 to place a small stone in the way, yeah kinda > works. It's > mostly a reminder that this is not stable ABI. > > Thanks, > Daniel