On Sat, Dec 10, 2016 at 8:40 AM, Cong Wang <xiyou.wangc...@gmail.com> wrote: >>> On 2016-12-08 22:57, Cong Wang wrote: >>>> On Thu, Dec 8, 2016 at 10:02 PM, Richard Guy Briggs <r...@redhat.com> >>>> wrote: >>>> > I also tried to extend Cong Wang's idea to attempt to proactively >>>> > respond to a >>>> > NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking >>>> > error >>>> > stack dump using mutex_lock(&audit_cmd_mutex) in the notifier callback. >>>> > Eliminating the lock since the sock is dead anways eliminates the error. >>>> > >>>> > Is it safe? I'll resubmit if this looks remotely sane. Meanwhile I'll >>>> > try to >>>> > get the test case to compile. >>>> >>>> It doesn't look safe, because 'audit_sock', 'audit_nlk_portid' and >>>> 'audit_pid' >>>> are updated as a whole and race between audit_receive_msg() and >>>> NETLINK_URELEASE. >>> >>> This is what I expected and why I originally added the mutex lock in the >>> callback... The dumps I got were bare with no wrapper identifying the >>> process context or specific error, so I'm at a bit of a loss how to >>> solve this (without thinking more about it) other than instinctively >>> removing the mutex. >> >> Netlink notifier can safely be converted to blocking one, I will send >> a patch. >> >> But I seriously doubt you really need NETLINK_URELEASE here, >> it adds nothing but overhead, b/c the netlink notifier is called on >> every netlink socket in the system, but for net exit path, that is >> relatively a slow path. >> >> Also, kauditd_send_skb() needs audit_cmd_mutex too. > > Please let me know what you think about the attached patch?
Applied the patch locally and have not seen the bug since then (~24 hours of testing).