On Wed, Jan 27, 2021 at 8:55 AM Chinmay Agarwal <china...@codeaurora.org> wrote: > > Following race condition was detected: > <CPU A, t0> - neigh_flush_dev() is under execution and calls > neigh_mark_dead(n) marking the neighbour entry 'n' as dead. > > <CPU B, t1> - Executing: __netif_receive_skb() -> > __netif_receive_skb_core() -> arp_rcv() -> arp_process().arp_process() > calls __neigh_lookup() which takes a reference on neighbour entry 'n'. > > <CPU A, t2> - Moves further along neigh_flush_dev() and calls > neigh_cleanup_and_release(n), but since reference count increased in t2, > 'n' couldn't be destroyed. > > <CPU B, t3> - Moves further along, arp_process() and calls > neigh_update()-> __neigh_update() -> neigh_update_gc_list(), which adds > the neighbour entry back in gc_list(neigh_mark_dead(), removed it > earlier in t0 from gc_list) > > <CPU B, t4> - arp_process() finally calls neigh_release(n), destroying > the neighbour entry. > > This leads to 'n' still being part of gc_list, but the actual > neighbour structure has been freed. > > The situation can be prevented from happening if we disallow a dead > entry to have any possibility of updating gc_list. This is what the > patch intends to achieve. > > Fixes: 9c29a2f55ec0 ("neighbor: Fix locking order for gc_list changes") > Signed-off-by: Chinmay Agarwal <china...@codeaurora.org>
Reviewed-by: Cong Wang <xiyou.wangc...@gmail.com> Thanks.