Andy Lutomirski <l...@amacapital.net> writes: > On Thu, Feb 2, 2017 at 8:33 PM, Eric W. Biederman <ebied...@xmission.com> > wrote: >> Alexei Starovoitov <a...@fb.com> writes: >> >>> On 1/26/17 11:07 AM, Andy Lutomirski wrote: >>>> On Thu, Jan 26, 2017 at 10:32 AM, Alexei Starovoitov <a...@fb.com> wrote: >>>>> On 1/26/17 10:12 AM, Andy Lutomirski wrote: >>>>>> >>>>>> On Thu, Jan 26, 2017 at 9:46 AM, Alexei Starovoitov <a...@fb.com> wrote: >>>>>>> >>>>>>> On 1/26/17 8:37 AM, Andy Lutomirski wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Think of bpf programs as safe kernel modules. They don't have >>>>>>>>> confined boundaries and program authors, if not careful, can shoot >>>>>>>>> themselves in the foot. We're not trying to prevent that because >>>>>>>>> it's impossible to check that the program is sane. Just like >>>>>>>>> it's impossible to check that kernel module is sane. >>>>>>>>> But in case of bpf we check that bpf program is _safe_ from the kernel >>>>>>>>> point of view. If it's doing some garbage, it's program's business. >>>>>>>>> Does it make more sense now? >>>>>>>>> >>>>>>>> >>>>>>>> With all due respect, I think this is not an acceptable way to think >>>>>>>> about BPF at all. If you think of BPF this way, I think there needs >>>>>>>> to be a real discussion at KS or similar as to whether this is okay. >>>>>>>> The reason is simple: the kernel promises a stable ABI to userspace >>>>>>>> but not to kernel modules. By thinking of BPF as more like a module, >>>>>>>> you're taking a big shortcut that will either result in ABI breakage >>>>>>>> down the road or in committing to a problematic stable ABI. >>>>>>> >>>>>>> >>>>>>> >>>>>>> you misunderstood the analogy. >>>>>>> bpf abi is certainly stable. that's why we were careful of not >>>>>>> exposing anything to it that is not already stable. >>>>>>> >>>>>> >>>>>> In that case I don't understand what you're trying to say. Eric >>>>>> thinks your patch exposes a bad interface. A bad interface for >>>>>> userspace is a very different thing from a bad interface available to >>>>>> kernel modules. Are you saying that BPF is kernel-module-like in that >>>>>> the ABI exposed to BPF programs doesn't need to meet the same quality >>>>>> standards as userspace ABIs? >>>>> >>>>> >>>>> of course not. >>>>> ns.inum is already exposed to user space as a value. >>>>> This patch exposes it to bpf program in a convenient and stable way, >>>> >>>> Here's what I'm imaging Eric is thinking: >>>> >>>> ns.inum is currently exposed to userspace via procfs. In principle, >>>> the value could be local to a namespace, though, which would enable >>>> CRIU to be able to preserve namespace inode numbers across a >>>> checkpoint+restore operation. If this happened, the contained and >>>> restored procfs would see a different inode number than the outermost >>>> procfs. >>> >>> sure. there are many different ways for the program to see inode >>> that either was already reused or disappeared. >>> What I'm saying that it is expected. We cannot prevent that from >>> bpf side. Just like ifindex value read by the program can be bogus >>> as in the example I just provided. >> >> The point is that we can make the inode number stable across migration >> and the user space API for namespaces has been designed with that >> possibility in mind. > > How does it help if BPF starts exposing both inode number and device > number?
Adding the device number comparison helps in that it is explicit what is being compared against. That gives me at least a bit of a namespace for the namespaces, and a program from a sufficiently wrong context will have it's comparisons fail rather than having a match. I think the operation that is exported in the BPF should be a full comparison operation of device and inode number so that it could be optimized/compiled to something else depending upon the context. AKA the compilation of the bpf program would have the opportunity to remove the namespace dependency and make the program work in a global context. So we don't have to carry namespace information around at run time. > ISTM any ability to migrate namespaces and to migrate eBPF programs > that know about namespaces needs to have the eBPF program firmly > rooted in some namespace (or perhaps cgroup in this case) so that it > can see a namespaced view of the world. For this to work, presumably > we need to make sure that eBPF programs that are installed by programs > that are in a container don't see traffic that isn't in that > container. This is part of why I think that we should consider > preventing programs that aren't in the root namespace (perhaps *all* > the root namespaces) from installing bpf+cgroup programs in the first > place until there's a clearer understanding of how this all fits > together. Andy I agree. At least to the point those programs are reading attributes that are in a namespace. Something that should be straight forward to verify in the bpf checker when installing the program. Eric