Re: [PATCH bpf-next] bpf: enable btf for use in all maps

Martin KaFai Lau Fri, 10 Aug 2018 05:55:42 -0700

On Fri, Aug 10, 2018 at 09:55:35AM +0200, Daniel Borkmann wrote:
> On 08/10/2018 04:13 AM, Alexei Starovoitov wrote:
> > On Fri, Aug 10, 2018 at 12:43:20AM +0200, Daniel Borkmann wrote:
> >> On 08/09/2018 11:44 PM, Alexei Starovoitov wrote:
> >>> On Thu, Aug 09, 2018 at 11:30:52PM +0200, Daniel Borkmann wrote:
> >>>> On 08/09/2018 11:14 PM, Alexei Starovoitov wrote:
> >>>>> On Thu, Aug 09, 2018 at 09:42:20PM +0200, Daniel Borkmann wrote:
> >>>>>> Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
> >>>>>> the basic arraymap") enabled support for BTF and dumping via
> >>>>>> BPF fs for arraymap. However, both can be decoupled from each
> >>>>>> other such that all BPF maps can be supported for attaching
> >>>>>> BTF key/value information, while not all maps necessarily
> >>>>>> need to dump via map_seq_show_elem() callback.
> >>>>>>
> >>>>>> The check in array_map_check_btf() can be generalized as
> >>>>>> ultimatively the key and value size is the only contraint
> >>>>>> that needs to match for the map. The fact that the key needs
> >>>>>> to be of type int is optional; it could be any data type as
> >>>>>> long as it matches the 4 byte key size, just like hash table
> >>>>>> key or others could be of any data type as well.
> >>>>>>
> >>>>>> Minimal example of a hash table dump which then works out
> >>>>>> of the box for bpftool:
> >>>>>>
> >>>>>>   # bpftool map dump id 19
> >>>>>>   [{
> >>>>>>           "key": {
> >>>>>>               "": {
> >>>>>>                   "vip": 0,
> >>>>>>                   "vipv6": []
> >>>>>>               },
> >>>>>>               "port": 0,
> >>>>>>               "family": 0,
> >>>>>>               "proto": 0
> >>>>>>           },
> >>>>>>           "value": {
> >>>>>>               "flags": 0,
> >>>>>>               "vip_num": 0
> >>>>>>           }
> >>>>>>       }
> >>>>>>   ]
> >>>>>>
> >>>>>> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
> >>>>>> Cc: Yonghong Song <y...@fb.com>
> >>>>>> ---
> >>>>>>  include/linux/bpf.h   |  4 +---
> >>>>>>  kernel/bpf/arraymap.c | 27 ---------------------------
> >>>>>>  kernel/bpf/inode.c    |  3 ++-
> >>>>>>  kernel/bpf/syscall.c  | 24 ++++++++++++++++++++----
> >>>>>>  4 files changed, 23 insertions(+), 35 deletions(-)
> >>>>>>
> >>>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> >>>>>> index cd8790d..eb76e8e 100644
> >>>>>> --- a/include/linux/bpf.h
> >>>>>> +++ b/include/linux/bpf.h
> >>>>>> @@ -48,8 +48,6 @@ struct bpf_map_ops {
> >>>>>>        u32 (*map_fd_sys_lookup_elem)(void *ptr);
> >>>>>>        void (*map_seq_show_elem)(struct bpf_map *map, void *key,
> >>>>>>                                  struct seq_file *m);
> >>>>>> -      int (*map_check_btf)(const struct bpf_map *map, const struct 
> >>>>>> btf *btf,
> >>>>>> -                           u32 key_type_id, u32 value_type_id);
> >>>>>>  };
> >>>>>>  
> >>>>>>  struct bpf_map {
> >>>>>> @@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const 
> >>>>>> struct bpf_map *map)
> >>>>>>  
> >>>>>>  static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
> >>>>>>  {
> >>>>>> -      return map->ops->map_seq_show_elem && map->ops->map_check_btf;
> >>>>>> +      return map->btf && map->ops->map_seq_show_elem;
> >>>>>>  }
> >>>>>>  
> >>>>>>  extern const struct bpf_map_ops bpf_map_offload_ops;
> >>>>>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> >>>>>> index 2aa55d030..67f0bdf 100644
> >>>>>> --- a/kernel/bpf/arraymap.c
> >>>>>> +++ b/kernel/bpf/arraymap.c
> >>>>>> @@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct 
> >>>>>> bpf_map *map, void *key,
> >>>>>>        rcu_read_unlock();
> >>>>>>  }
> >>>>>>  
> >>>>>> -static int array_map_check_btf(const struct bpf_map *map, const 
> >>>>>> struct btf *btf,
> >>>>>> -                             u32 btf_key_id, u32 btf_value_id)
> >>>>>> -{
> >>>>>> -      const struct btf_type *key_type, *value_type;
> >>>>>> -      u32 key_size, value_size;
> >>>>>> -      u32 int_data;
> >>>>>> -
> >>>>>> -      key_type = btf_type_id_size(btf, &btf_key_id, &key_size);
> >>>>>> -      if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
> >>>>>> -              return -EINVAL;
> >>>>>> -
> >>>>>> -      int_data = *(u32 *)(key_type + 1);
> >>>>>> -      /* bpf array can only take a u32 key.  This check makes
> >>>>>> -       * sure that the btf matches the attr used during map_create.
> >>>>>> -       */
> >>>>>> -      if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
> >>>>>> -          BTF_INT_OFFSET(int_data))
> >>>>>> -              return -EINVAL;
> >>>>>
> >>>>> I think most of these checks are still necessary for array type.
> >>>>> Relaxing BTF array key from BTF_KIND_INT to, for example, BTF_KIND_ENUM
> >>>>> is probably ok, but key being BTF_KIND_PTR or BTF_KIND_ARRAY doesn't 
> >>>>> makes sense.
> >>>>
> >>>> Hmm, so on 64 bit archs BTF_KIND_PTR would get rejected for array,
> >>>> on 32 bit it may be allowed due to sizeof(void *) == 4. BTF_KIND_ARRAY
> >>>> could be array of u8 foo[4], for example, or u16 foo[2]. But how would
> >>>> it ultimately be different from e.g. having 'struct a' versus 'struct b'
> >>>> where both are of same size and while actual key has 'struct a', the one
> >>>> who writes the prog resp. loads the BTF into the kernel would lie about
> >>>> it stating it's of type 'struct b' instead? It's basically trusting the
> >>>> app that it advertised sane key types which kernel is propagating back.
> >>>
> >>> for hash map - yes. the kernel cannot yet catch the lie that
> >>> key == 'struct a' that user said in BTF is not what program used
> >>> (which used 'struct b' of the same size).
> >>> Eventually we will annotate all load/store in the program and will
> >>> make sure that memory access match what BTF said.
> >>
> >> But in that case, would you reject the program? E.g. from prog point of
> >> view, it's just a buffer of x bytes, so key could be casted to different
> >> struct/types potentially and used for lookup; similar with value if you
> >> would go the route to annotate all access into it. I don't think this
> >> serves as a security feature (since you might as well just load the prog
> >> without BTF just fine), but it can be used to help verifier to perform
> >> rewrites like in tracing for implicit bpf_probe_read() based on member
> >> access. But also in that case, if you might have e.g. stale or wrong BTF
> >> data e.g. of the whole kernel or some application it would follow/walk
> >> that one instead. But such user error would be "acceptable" since it serves
> >> as a hint, roughly similar to (explicitly) walking the data structures
> >> based on the headers today, you do have better control in terms of header
> >> mismatches in that you can ship the BTF directly from the build, but 
> >> there's
> >> still no guarantee in that sense that you "verified" that these bytes
> >> originally were mapped to struct foo somewhere in a C program.
> > 
> > I wouldn't view such key checks as safety feature, but rather
> > as honesty check. Meaning that user space shouldn't be able to cheat
> > by passing completely bogus BTF into the kernel that only
> > statisfies single size check.
> 
> Ok, meaning, we agree that BTF is passed as a hint to the kernel. It is
> a 'should not cheat' requirement but not 'cannot cheat', meaning if it
> was the latter, BTF is core part of the verifier's safety analysis /
> checks, but as it's optional on top of it (well, due to compatibility it
> kind of needs to be anyway), it doesn't affect kernel's safety in
> relation to what the program is doing internally.
> 
> > Say, BTF has a key that is a 4 byte struct { char a,b,c,d; };
> > but program operates with it as u32. I very much would like
> > the verifier to notice and reject such program, since if C code
> > has struct { char a,b,c,d; }; the compiler won't generate u32 access
> > to it unless C code type casts, but then llvm will warn. So for u32
> > to legitimately appear in the generated code the struct should be:
> > union {
> >   struct { char a,b,c,d;}
> >   u32 e;
> > };
> > Narrow access (like u8 load/store in the bpf prog form u32 BTF field)
> > is ok, since that's normal compiler optimization, but any other
> > field/size mismatch the verifier should reject to prevent cheating.
> > 
> > In other words even proprietary bpf programs should not be able to cheat.
> > If they provide BTF to the kernel, it should correspond exactly to the 
> > program.
> > That's the key to _trusted_ introspection.
> 
> But then wouldn't this artificially force users into programming /
> thinking style that verifier dictates upon them similar as we have
> today as opposed to further moving away from it to allow more C-style
> programs to be accepted?
> 
> But even if that is the goal, it is still used as a hint today, e.g.
> even for an array I could tell the compiler that the key is '__u32'
> for BTF sake while using the underlying key differently (e.g. as struct,
> enum, etc) in order to pass the array_map_check_btf() check in the
> kernel. Those type of programs would still need to be accepted due
> to compatibility reasons. Same if we only test on size match in other
> maps, it's nothing different. I don't see how verifier would start
> enforcing programs to be rejected based on access patterns on the
> types; while it might work for newly added program types that this
> could be enforced, it cannot for all the many existing program types,
> but I also don't really think it needs to be. Unless it is declared
> a safety feature for future program types.
> 
> > Admin that ssh-es into the box and operates with bpftool should be
> > certain that the map introspection represent the real situation of
> > the program. If proprietary prog is paranoid about layout of map
> > it shouldn't be using BTF, but if it does, BTF should match.
> > In the future I'd like to enfore availability of BTF for
> > new program types.
> 
> Sure, and in vast majority / normal cases it will represent the real
> situation. Do you mistrust DWARF debugging data that gets shipped with
> your distro, for example? pahole and other tools on top of it? It is
> pretty much the same situation. Given gdb example below, does gdb do
> any verification of the binary in order to analyze load/store patterns
> and whether it matches with what sits in DWARF as types? It would just
> as well map the types into memory and dump it as pretty print instead.
> And that's okay, the one thing that needs to be verified for correctness
> resp. validity is the debugging format itself so it can be used for
> that.
I think decoupling map_seq_show_elem() requirement from BTF support
during map_create() is ok.  There is plan to do that going forward if
it ever hits some map types that is difficult to do map_seq_show_elem().
However, I think most of the maps should have pretty straight
forward seq_show implementation considering most of the
heavy lifting has already been done in btf_*_seq_show()
and bpffs_map_seq_ops.


Regarding one general map_check_btf() for all maps' purpose,
I think not all bpf maps are behaving the same.

There are some property that the kernel implies on a particular
map type.  Some of them even have kernel generated value in it.
For example, map-in-map, the lookup_elem() value is a map's id
which must be an integer generated by kernel's idr_alloc_cyclic().
Treating it as enum/struct/array...etc seems wrong.
Hence, I think each map type should be able to provide
its own map_check_btf().  If a map does not have
specific key/value requirement, it can use the default one which
is the min requirement for kernel safety reason.

The map can provide a more strict requirement first and relax it
later when use cases come up.

> 
> >>> For array we can catch the lie today that key is not 4 byte int,
> >>> since it matters from pretty printing point of view.
> >>> If it's PTR or ARRAY or STRUCT, the printer will go nuts.
> >>
> >> In that case, would you enforce a hash map key size of 4 also to INT-only
> >> instead of e.g. allowing STRUCT and various others?
> > 
> > hash map key of 4 bytes can be anything and printing is ideally
> > done similar to the way gdb prints stl c++:
> > #include <map>
> > 
> > struct K {
> >   int a, b;
> > };
> > struct V {
> >   int c, d;
> > };
> > int operator<(const K& a, const K& b) {return a.a < b.a;}
> > std::map<K, V> m;
> > 
> > int main()
> > {
> >   K k1 = {1, 2};
> >   K k2 = {3, 4};
> >   V v1 = {5, 6};
> >   V v2 = {7, 8};
> >   m[k1] = v1;
> >   m[k2] = v2;
> > ...
> > (gdb) p m
> > $3 = std::map with 2 elements = {[{a = 1, b = 2}] = {c = 5, d = 6}, [{a = 
> > 3, b = 4}] = {c = 7, d = 8}}
> > (gdb) set print pretty on
> > (gdb) p m
> > $4 = std::map with 2 elements = {
> >   [{
> >     a = 1,
> >     b = 2
> >   }] = {
> >     c = 5,
> >     d = 6
> >   },
> >   [{
> >     a = 3,
> >     b = 4
> >   }] = {
> >     c = 7,
> >     d = 8
> >   }
> > }
>

Re: [PATCH bpf-next] bpf: enable btf for use in all maps

Reply via email to