On Wed, Jan 27, 2021 at 8:55 AM Chinmay Agarwal <china...@codeaurora.org> wrote:
>
> Following race condition was detected:
> <CPU A, t0> - neigh_flush_dev() is under execution and calls
> neigh_mark_dead(n) marking the neighbour entry 'n' as dead.
>
> <CPU B, t1> - Executing: __netif_receive_skb() ->
> __netif_receive_skb_core() -> arp_rcv() -> arp_process().arp_process()
> calls __neigh_lookup() which takes a reference on neighbour entry 'n'.
>
> <CPU A, t2> - Moves further along neigh_flush_dev() and calls
> neigh_cleanup_and_release(n), but since reference count increased in t2,
> 'n' couldn't be destroyed.
>
> <CPU B, t3> - Moves further along, arp_process() and calls
> neigh_update()-> __neigh_update() -> neigh_update_gc_list(), which adds
> the neighbour entry back in gc_list(neigh_mark_dead(), removed it
> earlier in t0 from gc_list)
>
> <CPU B, t4> - arp_process() finally calls neigh_release(n), destroying
> the neighbour entry.
>
> This leads to 'n' still being part of gc_list, but the actual
> neighbour structure has been freed.
>
> The situation can be prevented from happening if we disallow a dead
> entry to have any possibility of updating gc_list. This is what the
> patch intends to achieve.
>
> Fixes: 9c29a2f55ec0 ("neighbor: Fix locking order for gc_list changes")
> Signed-off-by: Chinmay Agarwal <china...@codeaurora.org>

Reviewed-by: Cong Wang <xiyou.wangc...@gmail.com>

Thanks.

Reply via email to