Hi Yonghong, On 02/12/2018 10:58 PM, Yonghong Song wrote: > There is a memory leak happening in lpm_trie map_free callback > function trie_free. The trie structure itself does not get freed. > > Also, trie_free function did not do synchronize_rcu before freeing > various data structures. This is incorrect as some rcu_read_lock > region(s) for lookup, update, delete or get_next_key may not complete yet. > The fix is to add synchronize_rcu in the beginning of trie_free. > The useless spin_lock is removed from this function as well. > > Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map > implementation") > Reported-by: Mathieu Malaterre <ma...@debian.org> > Reported-by: Alexei Starovoitov <a...@kernel.org> > Tested-by: Mathieu Malaterre <ma...@debian.org> > Signed-off-by: Yonghong Song <y...@fb.com> > --- > kernel/bpf/lpm_trie.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c > index 7b469d1..9b41ea4 100644 > --- a/kernel/bpf/lpm_trie.c > +++ b/kernel/bpf/lpm_trie.c > @@ -555,7 +555,12 @@ static void trie_free(struct bpf_map *map) > struct lpm_trie_node __rcu **slot; > struct lpm_trie_node *node; > > - raw_spin_lock(&trie->lock); > + /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0, > + * so the programs (can be more than one that used this map) were > + * disconnected from events. Wait for outstanding programs to complete > + * update/lookup/delete/get_next_key and free the trie. > + */
I think the first part of the comment is slightly misleading, e.g. map refcount could drop to zero also for various other reasons, not strictly due to prog refcnt dropping to 0, so I would just keep the second part with 'Wait for outstanding [...]' which is okay since this is eventually what is relevant in this context. > + synchronize_rcu(); > > /* Always start at the root and walk down to a node that has no > * children. Then free that node, nullify its reference in the parent > @@ -588,7 +593,7 @@ static void trie_free(struct bpf_map *map) > } > > unlock: > - raw_spin_unlock(&trie->lock); Could you do a minor change here: since we get rid of the locking as there's no user left anymore after grace period, it would be great if you could also change the 'unlock' label name above. Other than that, good to go, thanks! > + kfree(trie); > } > > static int trie_get_next_key(struct bpf_map *map, void *_key, void > *_next_key) > Thanks, Daniel