I apologies for not explaining the scenario previously. sk_stop_timer() is used to stop the tcp timers with expiry callback tcp_write_timer(), tcp_delack_timer(), tcp_keepalive_timer(), ... del_timer() is used to stop the the timer in sk_stop_timer(), which might return a non-zero result even if one of these timer handler functions (tcp_write_timer(), tcp_delack_timer(), tcp_keepalive_timer(), ...) is already executing on another processor.
Following is the possible scenario :- on CPU #0: sk_stop_timer() decrements the sk->sk_refcnt if del_timer(timer) returns non-zero. on CPU #1: If a timer handler callback runs then it also calls sock_put(sk) which decrements sk->sk_refcnt and if the sk_refcnt becomes zero it frees the structure sock pointed to by sk. if the sk->sk_refcnt decrements twice then that will cause a mismatch in the number of "puts" and "holds" resulting in a malfunction of the sk->sk_refcnt mechanism. The solution is to use del_timer_sync() instead of del_timer() because del_timer_sync() will wait for timer handler functions to complete execution. yes, we are facing some memory corruption issues due to access of already released struct sock in our environment. Our memory corruption issue looks like memory locations being decremented which could be consistent with a rogue decrement of a reference counter. similar suggestion is also made by Dean Jenkins in rfcomm_dlc_clear_timer() and accepted by Marcel. http://www.spinics.net/lists/linux-bluetooth/msg51132.html with warm regards, Deepak Das ________________________________________ From: Eric Dumazet [eric.duma...@gmail.com] Sent: Thursday, August 07, 2014 12:25 PM To: Das, Deepak Cc: da...@davemloft.net; net...@vger.kernel.org; linux-kernel@vger.kernel.org Subject: Re: [RFC] net: Replace del_timer() with del_timer_sync() On Thu, 2014-08-07 at 11:48 +0530, Deepak wrote: > on SMP system, del_timer() might return even if the timer function > is running on other cpu so sk_stop_timer() will execute __sock_put() > while timer is accessing the socket on other cpu causing > "use-after-free". > > This commit replaces del_timer() with del_timer_sync() in > sk_stop_timer(). > del_timer_sync() will wait untill the timer function is not running in > any other cpu hence making sk_stop_timer() SMP safe. > > Signed-off-by: Deepak Das <deepak_...@mentor.com> > > diff --git a/net/core/sock.c b/net/core/sock.c > index 026e01f..491a84d 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -2304,7 +2304,7 @@ EXPORT_SYMBOL(sk_reset_timer); > > void sk_stop_timer(struct sock *sk, struct timer_list* timer) > { > - if (del_timer(timer)) > + if (del_timer_sync(timer)) > __sock_put(sk); > } > EXPORT_SYMBOL(sk_stop_timer); There is a reason del_timer() and del_timer_sync() both exist, and both are SMP safe. Here, caller might block timer handler from making progress, you are adding a deadlock condition. In this case, there is no reason to use del_timer_sync(), you didn't explain why you want this to happen in the first place. If you hit a bug somewhere, please share it so that we can root cause it. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/