On 08/10/2018 02:54 PM, Martin KaFai Lau wrote: > On Fri, Aug 10, 2018 at 09:55:35AM +0200, Daniel Borkmann wrote: >> On 08/10/2018 04:13 AM, Alexei Starovoitov wrote: >>> On Fri, Aug 10, 2018 at 12:43:20AM +0200, Daniel Borkmann wrote: >>>> On 08/09/2018 11:44 PM, Alexei Starovoitov wrote: >>>>> On Thu, Aug 09, 2018 at 11:30:52PM +0200, Daniel Borkmann wrote: >>>>>> On 08/09/2018 11:14 PM, Alexei Starovoitov wrote: >>>>>>> On Thu, Aug 09, 2018 at 09:42:20PM +0200, Daniel Borkmann wrote: >>>>>>>> Commit a26ca7c982cb ("bpf: btf: Add pretty print support to >>>>>>>> the basic arraymap") enabled support for BTF and dumping via >>>>>>>> BPF fs for arraymap. However, both can be decoupled from each >>>>>>>> other such that all BPF maps can be supported for attaching >>>>>>>> BTF key/value information, while not all maps necessarily >>>>>>>> need to dump via map_seq_show_elem() callback. >>>>>>>> >>>>>>>> The check in array_map_check_btf() can be generalized as >>>>>>>> ultimatively the key and value size is the only contraint >>>>>>>> that needs to match for the map. The fact that the key needs >>>>>>>> to be of type int is optional; it could be any data type as >>>>>>>> long as it matches the 4 byte key size, just like hash table >>>>>>>> key or others could be of any data type as well. >>>>>>>> >>>>>>>> Minimal example of a hash table dump which then works out >>>>>>>> of the box for bpftool: >>>>>>>> >>>>>>>> # bpftool map dump id 19 >>>>>>>> [{ >>>>>>>> "key": { >>>>>>>> "": { >>>>>>>> "vip": 0, >>>>>>>> "vipv6": [] >>>>>>>> }, >>>>>>>> "port": 0, >>>>>>>> "family": 0, >>>>>>>> "proto": 0 >>>>>>>> }, >>>>>>>> "value": { >>>>>>>> "flags": 0, >>>>>>>> "vip_num": 0 >>>>>>>> } >>>>>>>> } >>>>>>>> ] >>>>>>>> >>>>>>>> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net> >>>>>>>> Cc: Yonghong Song <y...@fb.com> >>>>>>>> --- >>>>>>>> include/linux/bpf.h | 4 +--- >>>>>>>> kernel/bpf/arraymap.c | 27 --------------------------- >>>>>>>> kernel/bpf/inode.c | 3 ++- >>>>>>>> kernel/bpf/syscall.c | 24 ++++++++++++++++++++---- >>>>>>>> 4 files changed, 23 insertions(+), 35 deletions(-) >>>>>>>> >>>>>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >>>>>>>> index cd8790d..eb76e8e 100644 >>>>>>>> --- a/include/linux/bpf.h >>>>>>>> +++ b/include/linux/bpf.h >>>>>>>> @@ -48,8 +48,6 @@ struct bpf_map_ops { >>>>>>>> u32 (*map_fd_sys_lookup_elem)(void *ptr); >>>>>>>> void (*map_seq_show_elem)(struct bpf_map *map, void *key, >>>>>>>> struct seq_file *m); >>>>>>>> - int (*map_check_btf)(const struct bpf_map *map, const struct >>>>>>>> btf *btf, >>>>>>>> - u32 key_type_id, u32 value_type_id); >>>>>>>> }; >>>>>>>> >>>>>>>> struct bpf_map { >>>>>>>> @@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const >>>>>>>> struct bpf_map *map) >>>>>>>> >>>>>>>> static inline bool bpf_map_support_seq_show(const struct bpf_map *map) >>>>>>>> { >>>>>>>> - return map->ops->map_seq_show_elem && map->ops->map_check_btf; >>>>>>>> + return map->btf && map->ops->map_seq_show_elem; >>>>>>>> } >>>>>>>> >>>>>>>> extern const struct bpf_map_ops bpf_map_offload_ops; >>>>>>>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c >>>>>>>> index 2aa55d030..67f0bdf 100644 >>>>>>>> --- a/kernel/bpf/arraymap.c >>>>>>>> +++ b/kernel/bpf/arraymap.c >>>>>>>> @@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct >>>>>>>> bpf_map *map, void *key, >>>>>>>> rcu_read_unlock(); >>>>>>>> } >>>>>>>> >>>>>>>> -static int array_map_check_btf(const struct bpf_map *map, const >>>>>>>> struct btf *btf, >>>>>>>> - u32 btf_key_id, u32 btf_value_id) >>>>>>>> -{ >>>>>>>> - const struct btf_type *key_type, *value_type; >>>>>>>> - u32 key_size, value_size; >>>>>>>> - u32 int_data; >>>>>>>> - >>>>>>>> - key_type = btf_type_id_size(btf, &btf_key_id, &key_size); >>>>>>>> - if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT) >>>>>>>> - return -EINVAL; >>>>>>>> - >>>>>>>> - int_data = *(u32 *)(key_type + 1); >>>>>>>> - /* bpf array can only take a u32 key. This check makes >>>>>>>> - * sure that the btf matches the attr used during map_create. >>>>>>>> - */ >>>>>>>> - if (BTF_INT_BITS(int_data) != 32 || key_size != 4 || >>>>>>>> - BTF_INT_OFFSET(int_data)) >>>>>>>> - return -EINVAL; >>>>>>> >>>>>>> I think most of these checks are still necessary for array type. >>>>>>> Relaxing BTF array key from BTF_KIND_INT to, for example, BTF_KIND_ENUM >>>>>>> is probably ok, but key being BTF_KIND_PTR or BTF_KIND_ARRAY doesn't >>>>>>> makes sense. >>>>>> >>>>>> Hmm, so on 64 bit archs BTF_KIND_PTR would get rejected for array, >>>>>> on 32 bit it may be allowed due to sizeof(void *) == 4. BTF_KIND_ARRAY >>>>>> could be array of u8 foo[4], for example, or u16 foo[2]. But how would >>>>>> it ultimately be different from e.g. having 'struct a' versus 'struct b' >>>>>> where both are of same size and while actual key has 'struct a', the one >>>>>> who writes the prog resp. loads the BTF into the kernel would lie about >>>>>> it stating it's of type 'struct b' instead? It's basically trusting the >>>>>> app that it advertised sane key types which kernel is propagating back. >>>>> >>>>> for hash map - yes. the kernel cannot yet catch the lie that >>>>> key == 'struct a' that user said in BTF is not what program used >>>>> (which used 'struct b' of the same size). >>>>> Eventually we will annotate all load/store in the program and will >>>>> make sure that memory access match what BTF said. >>>> >>>> But in that case, would you reject the program? E.g. from prog point of >>>> view, it's just a buffer of x bytes, so key could be casted to different >>>> struct/types potentially and used for lookup; similar with value if you >>>> would go the route to annotate all access into it. I don't think this >>>> serves as a security feature (since you might as well just load the prog >>>> without BTF just fine), but it can be used to help verifier to perform >>>> rewrites like in tracing for implicit bpf_probe_read() based on member >>>> access. But also in that case, if you might have e.g. stale or wrong BTF >>>> data e.g. of the whole kernel or some application it would follow/walk >>>> that one instead. But such user error would be "acceptable" since it serves >>>> as a hint, roughly similar to (explicitly) walking the data structures >>>> based on the headers today, you do have better control in terms of header >>>> mismatches in that you can ship the BTF directly from the build, but >>>> there's >>>> still no guarantee in that sense that you "verified" that these bytes >>>> originally were mapped to struct foo somewhere in a C program. >>> >>> I wouldn't view such key checks as safety feature, but rather >>> as honesty check. Meaning that user space shouldn't be able to cheat >>> by passing completely bogus BTF into the kernel that only >>> statisfies single size check. >> >> Ok, meaning, we agree that BTF is passed as a hint to the kernel. It is >> a 'should not cheat' requirement but not 'cannot cheat', meaning if it >> was the latter, BTF is core part of the verifier's safety analysis / >> checks, but as it's optional on top of it (well, due to compatibility it >> kind of needs to be anyway), it doesn't affect kernel's safety in >> relation to what the program is doing internally. >> >>> Say, BTF has a key that is a 4 byte struct { char a,b,c,d; }; >>> but program operates with it as u32. I very much would like >>> the verifier to notice and reject such program, since if C code >>> has struct { char a,b,c,d; }; the compiler won't generate u32 access >>> to it unless C code type casts, but then llvm will warn. So for u32 >>> to legitimately appear in the generated code the struct should be: >>> union { >>> struct { char a,b,c,d;} >>> u32 e; >>> }; >>> Narrow access (like u8 load/store in the bpf prog form u32 BTF field) >>> is ok, since that's normal compiler optimization, but any other >>> field/size mismatch the verifier should reject to prevent cheating. >>> >>> In other words even proprietary bpf programs should not be able to cheat. >>> If they provide BTF to the kernel, it should correspond exactly to the >>> program. >>> That's the key to _trusted_ introspection. >> >> But then wouldn't this artificially force users into programming / >> thinking style that verifier dictates upon them similar as we have >> today as opposed to further moving away from it to allow more C-style >> programs to be accepted? >> >> But even if that is the goal, it is still used as a hint today, e.g. >> even for an array I could tell the compiler that the key is '__u32' >> for BTF sake while using the underlying key differently (e.g. as struct, >> enum, etc) in order to pass the array_map_check_btf() check in the >> kernel. Those type of programs would still need to be accepted due >> to compatibility reasons. Same if we only test on size match in other >> maps, it's nothing different. I don't see how verifier would start >> enforcing programs to be rejected based on access patterns on the >> types; while it might work for newly added program types that this >> could be enforced, it cannot for all the many existing program types, >> but I also don't really think it needs to be. Unless it is declared >> a safety feature for future program types. >> >>> Admin that ssh-es into the box and operates with bpftool should be >>> certain that the map introspection represent the real situation of >>> the program. If proprietary prog is paranoid about layout of map >>> it shouldn't be using BTF, but if it does, BTF should match. >>> In the future I'd like to enfore availability of BTF for >>> new program types. >> >> Sure, and in vast majority / normal cases it will represent the real >> situation. Do you mistrust DWARF debugging data that gets shipped with >> your distro, for example? pahole and other tools on top of it? It is >> pretty much the same situation. Given gdb example below, does gdb do >> any verification of the binary in order to analyze load/store patterns >> and whether it matches with what sits in DWARF as types? It would just >> as well map the types into memory and dump it as pretty print instead. >> And that's okay, the one thing that needs to be verified for correctness >> resp. validity is the debugging format itself so it can be used for >> that. > I think decoupling map_seq_show_elem() requirement from BTF support > during map_create() is ok. There is plan to do that going forward if > it ever hits some map types that is difficult to do map_seq_show_elem(). > However, I think most of the maps should have pretty straight > forward seq_show implementation considering most of the > heavy lifting has already been done in btf_*_seq_show() > and bpffs_map_seq_ops.
I think potentially this could also be made generic for majority of maps in that you do the lookup and then the key/value btf_type_seq_show() dump from it under RCU. > Regarding one general map_check_btf() for all maps' purpose, > I think not all bpf maps are behaving the same. > > There are some property that the kernel implies on a particular > map type. Some of them even have kernel generated value in it. > For example, map-in-map, the lookup_elem() value is a map's id > which must be an integer generated by kernel's idr_alloc_cyclic(). > Treating it as enum/struct/array...etc seems wrong. > Hence, I think each map type should be able to provide > its own map_check_btf(). If a map does not have > specific key/value requirement, it can use the default one which > is the min requirement for kernel safety reason. Yeah, fair point, I think potentially for the in-kernel generated ones at some point they could provide their own in-kernel BTF description. Ok, I'll rework to have a default and stricter map specific override if present which those in-kernel ones will reject then. Thanks, Daniel