From: Sowmini Varadhan <sowmini.varad...@oracle.com> Date: Wed, 8 Aug 2018 13:57:13 -0700
> The following deadlock, reported by syzbot, can occur if CPU0 is in > rds_send_remove_from_sock() while CPU1 is in rds_clear_recv_queue() > > CPU0 CPU1 > ---- ---- > lock(&(&rm->m_rs_lock)->rlock); > lock(&rs->rs_recv_lock); > lock(&(&rm->m_rs_lock)->rlock); > lock(&rs->rs_recv_lock); > > The deadlock should be avoided by moving the messages from the > rs_recv_queue into a tmp_list in rds_clear_recv_queue() under > the rs_recv_lock, and then dropping the refcnt on the messages > in the tmp_list (potentially resulting in rds_message_purge()) > after dropping the rs_recv_lock. > > The same lock hierarchy violation also exists in rds_still_queued() > and should be avoided in a similar manner > > Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> > Reported-by: syzbot+52140d69ac6dc6b92...@syzkaller.appspotmail.com I'm putting this in deferred state for now. Sowmini, once you and Santosh agree on what exactly to do, please resubmit. Thank you.