On Fri, Aug 10, 2018 at 09:55:35AM +0200, Daniel Borkmann wrote: > On 08/10/2018 04:13 AM, Alexei Starovoitov wrote: > > On Fri, Aug 10, 2018 at 12:43:20AM +0200, Daniel Borkmann wrote: > >> On 08/09/2018 11:44 PM, Alexei Starovoitov wrote: > >>> On Thu, Aug 09, 2018 at 11:30:52PM +0200, Daniel Borkmann wrote: > >>>> On 08/09/2018 11:14 PM, Alexei Starovoitov wrote: > >>>>> On Thu, Aug 09, 2018 at 09:42:20PM +0200, Daniel Borkmann wrote: > >>>>>> Commit a26ca7c982cb ("bpf: btf: Add pretty print support to > >>>>>> the basic arraymap") enabled support for BTF and dumping via > >>>>>> BPF fs for arraymap. However, both can be decoupled from each > >>>>>> other such that all BPF maps can be supported for attaching > >>>>>> BTF key/value information, while not all maps necessarily > >>>>>> need to dump via map_seq_show_elem() callback. > >>>>>> > >>>>>> The check in array_map_check_btf() can be generalized as > >>>>>> ultimatively the key and value size is the only contraint > >>>>>> that needs to match for the map. The fact that the key needs > >>>>>> to be of type int is optional; it could be any data type as > >>>>>> long as it matches the 4 byte key size, just like hash table > >>>>>> key or others could be of any data type as well. > >>>>>> > >>>>>> Minimal example of a hash table dump which then works out > >>>>>> of the box for bpftool: > >>>>>> > >>>>>> # bpftool map dump id 19 > >>>>>> [{ > >>>>>> "key": { > >>>>>> "": { > >>>>>> "vip": 0, > >>>>>> "vipv6": [] > >>>>>> }, > >>>>>> "port": 0, > >>>>>> "family": 0, > >>>>>> "proto": 0 > >>>>>> }, > >>>>>> "value": { > >>>>>> "flags": 0, > >>>>>> "vip_num": 0 > >>>>>> } > >>>>>> } > >>>>>> ] > >>>>>> > >>>>>> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net> > >>>>>> Cc: Yonghong Song <y...@fb.com> > >>>>>> --- > >>>>>> include/linux/bpf.h | 4 +--- > >>>>>> kernel/bpf/arraymap.c | 27 --------------------------- > >>>>>> kernel/bpf/inode.c | 3 ++- > >>>>>> kernel/bpf/syscall.c | 24 ++++++++++++++++++++---- > >>>>>> 4 files changed, 23 insertions(+), 35 deletions(-) > >>>>>> > >>>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h > >>>>>> index cd8790d..eb76e8e 100644 > >>>>>> --- a/include/linux/bpf.h > >>>>>> +++ b/include/linux/bpf.h > >>>>>> @@ -48,8 +48,6 @@ struct bpf_map_ops { > >>>>>> u32 (*map_fd_sys_lookup_elem)(void *ptr); > >>>>>> void (*map_seq_show_elem)(struct bpf_map *map, void *key, > >>>>>> struct seq_file *m); > >>>>>> - int (*map_check_btf)(const struct bpf_map *map, const struct > >>>>>> btf *btf, > >>>>>> - u32 key_type_id, u32 value_type_id); > >>>>>> }; > >>>>>> > >>>>>> struct bpf_map { > >>>>>> @@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const > >>>>>> struct bpf_map *map) > >>>>>> > >>>>>> static inline bool bpf_map_support_seq_show(const struct bpf_map *map) > >>>>>> { > >>>>>> - return map->ops->map_seq_show_elem && map->ops->map_check_btf; > >>>>>> + return map->btf && map->ops->map_seq_show_elem; > >>>>>> } > >>>>>> > >>>>>> extern const struct bpf_map_ops bpf_map_offload_ops; > >>>>>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > >>>>>> index 2aa55d030..67f0bdf 100644 > >>>>>> --- a/kernel/bpf/arraymap.c > >>>>>> +++ b/kernel/bpf/arraymap.c > >>>>>> @@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct > >>>>>> bpf_map *map, void *key, > >>>>>> rcu_read_unlock(); > >>>>>> } > >>>>>> > >>>>>> -static int array_map_check_btf(const struct bpf_map *map, const > >>>>>> struct btf *btf, > >>>>>> - u32 btf_key_id, u32 btf_value_id) > >>>>>> -{ > >>>>>> - const struct btf_type *key_type, *value_type; > >>>>>> - u32 key_size, value_size; > >>>>>> - u32 int_data; > >>>>>> - > >>>>>> - key_type = btf_type_id_size(btf, &btf_key_id, &key_size); > >>>>>> - if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT) > >>>>>> - return -EINVAL; > >>>>>> - > >>>>>> - int_data = *(u32 *)(key_type + 1); > >>>>>> - /* bpf array can only take a u32 key. This check makes > >>>>>> - * sure that the btf matches the attr used during map_create. > >>>>>> - */ > >>>>>> - if (BTF_INT_BITS(int_data) != 32 || key_size != 4 || > >>>>>> - BTF_INT_OFFSET(int_data)) > >>>>>> - return -EINVAL; > >>>>> > >>>>> I think most of these checks are still necessary for array type. > >>>>> Relaxing BTF array key from BTF_KIND_INT to, for example, BTF_KIND_ENUM > >>>>> is probably ok, but key being BTF_KIND_PTR or BTF_KIND_ARRAY doesn't > >>>>> makes sense. > >>>> > >>>> Hmm, so on 64 bit archs BTF_KIND_PTR would get rejected for array, > >>>> on 32 bit it may be allowed due to sizeof(void *) == 4. BTF_KIND_ARRAY > >>>> could be array of u8 foo[4], for example, or u16 foo[2]. But how would > >>>> it ultimately be different from e.g. having 'struct a' versus 'struct b' > >>>> where both are of same size and while actual key has 'struct a', the one > >>>> who writes the prog resp. loads the BTF into the kernel would lie about > >>>> it stating it's of type 'struct b' instead? It's basically trusting the > >>>> app that it advertised sane key types which kernel is propagating back. > >>> > >>> for hash map - yes. the kernel cannot yet catch the lie that > >>> key == 'struct a' that user said in BTF is not what program used > >>> (which used 'struct b' of the same size). > >>> Eventually we will annotate all load/store in the program and will > >>> make sure that memory access match what BTF said. > >> > >> But in that case, would you reject the program? E.g. from prog point of > >> view, it's just a buffer of x bytes, so key could be casted to different > >> struct/types potentially and used for lookup; similar with value if you > >> would go the route to annotate all access into it. I don't think this > >> serves as a security feature (since you might as well just load the prog > >> without BTF just fine), but it can be used to help verifier to perform > >> rewrites like in tracing for implicit bpf_probe_read() based on member > >> access. But also in that case, if you might have e.g. stale or wrong BTF > >> data e.g. of the whole kernel or some application it would follow/walk > >> that one instead. But such user error would be "acceptable" since it serves > >> as a hint, roughly similar to (explicitly) walking the data structures > >> based on the headers today, you do have better control in terms of header > >> mismatches in that you can ship the BTF directly from the build, but > >> there's > >> still no guarantee in that sense that you "verified" that these bytes > >> originally were mapped to struct foo somewhere in a C program. > > > > I wouldn't view such key checks as safety feature, but rather > > as honesty check. Meaning that user space shouldn't be able to cheat > > by passing completely bogus BTF into the kernel that only > > statisfies single size check. > > Ok, meaning, we agree that BTF is passed as a hint to the kernel. It is > a 'should not cheat' requirement but not 'cannot cheat', meaning if it > was the latter, BTF is core part of the verifier's safety analysis / > checks, but as it's optional on top of it (well, due to compatibility it > kind of needs to be anyway), it doesn't affect kernel's safety in > relation to what the program is doing internally. > > > Say, BTF has a key that is a 4 byte struct { char a,b,c,d; }; > > but program operates with it as u32. I very much would like > > the verifier to notice and reject such program, since if C code > > has struct { char a,b,c,d; }; the compiler won't generate u32 access > > to it unless C code type casts, but then llvm will warn. So for u32 > > to legitimately appear in the generated code the struct should be: > > union { > > struct { char a,b,c,d;} > > u32 e; > > }; > > Narrow access (like u8 load/store in the bpf prog form u32 BTF field) > > is ok, since that's normal compiler optimization, but any other > > field/size mismatch the verifier should reject to prevent cheating. > > > > In other words even proprietary bpf programs should not be able to cheat. > > If they provide BTF to the kernel, it should correspond exactly to the > > program. > > That's the key to _trusted_ introspection. > > But then wouldn't this artificially force users into programming / > thinking style that verifier dictates upon them similar as we have > today as opposed to further moving away from it to allow more C-style > programs to be accepted? > > But even if that is the goal, it is still used as a hint today, e.g. > even for an array I could tell the compiler that the key is '__u32' > for BTF sake while using the underlying key differently (e.g. as struct, > enum, etc) in order to pass the array_map_check_btf() check in the > kernel. Those type of programs would still need to be accepted due > to compatibility reasons. Same if we only test on size match in other > maps, it's nothing different. I don't see how verifier would start > enforcing programs to be rejected based on access patterns on the > types; while it might work for newly added program types that this > could be enforced, it cannot for all the many existing program types, > but I also don't really think it needs to be. Unless it is declared > a safety feature for future program types. > > > Admin that ssh-es into the box and operates with bpftool should be > > certain that the map introspection represent the real situation of > > the program. If proprietary prog is paranoid about layout of map > > it shouldn't be using BTF, but if it does, BTF should match. > > In the future I'd like to enfore availability of BTF for > > new program types. > > Sure, and in vast majority / normal cases it will represent the real > situation. Do you mistrust DWARF debugging data that gets shipped with > your distro, for example? pahole and other tools on top of it? It is > pretty much the same situation. Given gdb example below, does gdb do > any verification of the binary in order to analyze load/store patterns > and whether it matches with what sits in DWARF as types? It would just > as well map the types into memory and dump it as pretty print instead. > And that's okay, the one thing that needs to be verified for correctness > resp. validity is the debugging format itself so it can be used for > that. I think decoupling map_seq_show_elem() requirement from BTF support during map_create() is ok. There is plan to do that going forward if it ever hits some map types that is difficult to do map_seq_show_elem(). However, I think most of the maps should have pretty straight forward seq_show implementation considering most of the heavy lifting has already been done in btf_*_seq_show() and bpffs_map_seq_ops.
Regarding one general map_check_btf() for all maps' purpose, I think not all bpf maps are behaving the same. There are some property that the kernel implies on a particular map type. Some of them even have kernel generated value in it. For example, map-in-map, the lookup_elem() value is a map's id which must be an integer generated by kernel's idr_alloc_cyclic(). Treating it as enum/struct/array...etc seems wrong. Hence, I think each map type should be able to provide its own map_check_btf(). If a map does not have specific key/value requirement, it can use the default one which is the min requirement for kernel safety reason. The map can provide a more strict requirement first and relax it later when use cases come up. > > >>> For array we can catch the lie today that key is not 4 byte int, > >>> since it matters from pretty printing point of view. > >>> If it's PTR or ARRAY or STRUCT, the printer will go nuts. > >> > >> In that case, would you enforce a hash map key size of 4 also to INT-only > >> instead of e.g. allowing STRUCT and various others? > > > > hash map key of 4 bytes can be anything and printing is ideally > > done similar to the way gdb prints stl c++: > > #include <map> > > > > struct K { > > int a, b; > > }; > > struct V { > > int c, d; > > }; > > int operator<(const K& a, const K& b) {return a.a < b.a;} > > std::map<K, V> m; > > > > int main() > > { > > K k1 = {1, 2}; > > K k2 = {3, 4}; > > V v1 = {5, 6}; > > V v2 = {7, 8}; > > m[k1] = v1; > > m[k2] = v2; > > ... > > (gdb) p m > > $3 = std::map with 2 elements = {[{a = 1, b = 2}] = {c = 5, d = 6}, [{a = > > 3, b = 4}] = {c = 7, d = 8}} > > (gdb) set print pretty on > > (gdb) p m > > $4 = std::map with 2 elements = { > > [{ > > a = 1, > > b = 2 > > }] = { > > c = 5, > > d = 6 > > }, > > [{ > > a = 3, > > b = 4 > > }] = { > > c = 7, > > d = 8 > > } > > } >