On Thu, Feb 9, 2017 at 7:33 PM, Sowmini Varadhan <sowmini.varad...@oracle.com> wrote: > On (02/09/17 19:19), Eric Dumazet wrote: >> >> More likely the bug is in fanout_add(), with a buggy sequence in error >> case, and not correct locking. >> >> kfree(po->rollover); >> po->rollover = NULL; >> >> Two cpus entering fanout_add() (using the same af_packet socket, >> syzkaller courtesy...) might both see po->fanout being NULL. >> >> Then they grab the mutex. Too late... > > I'm not sure I follow- aiui the panic was in acceessing the > sk_receive_queue.lock in a socket that had been closed earlier. I think > the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and > rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit > packet delivery can be done safely, and the synchronize_net in > packet_release() makes sure that the Tx paths are quiesced before freeing > the socket. What is the race-hole here? Does it have to do with the > _bh and softirq context, somehow? >
We have probably a dozen of bugs to fix in af_packet.c The race in fanout_add() is one ot theml. I do not believe Anoob Soman sent his fixes btw ... ( Look for this thread : http://marc.info/?l=linux-netdev&m=148588680525648&w=2