On Sat, 2017-01-14 at 14:53 +0100, Oliver Hartkopp wrote: > Hello Eric, > > On 01/14/2017 04:43 AM, Liu Shuo wrote: > > On Thu 12.Jan'17 at 17:33:38 +0100, Oliver Hartkopp wrote: > >> On 01/12/2017 02:01 PM, Eric Dumazet wrote: > > >>> The main problem seems that the sockets themselves are not RCU > >>> protected. > >>> > >>> If CAN uses RCU for delivery, then sockets should be freed only after > >>> one RCU grace period. > >>> > >>> On recent kernels, following patch could help : > >>> > >> > >> Thanks Eric! > >> > >> @Liu ShuoX: Can you check if Eric's suggestion fixes the issue in your > >> setup? > > Sorry for late reply. I was OOO yesterday. > > With Eric's hint, i just found his patch that "net: add SOCK_RCU_FREE > > socket flag" in the latest kernel. With backporting this one plus Eric's > > following patch, it fixs my failure. > > what would be the best approach to fix this issue - even in stable kernels? > > E.g. would this change be ok for a stable as a quick fix? > > diff --git a/net/can/af_can.c b/net/can/af_can.c > index 1108079d934f..6b974c2b66ef 100644 > --- a/net/can/af_can.c > +++ b/net/can/af_can.c > @@ -112,6 +112,7 @@ EXPORT_SYMBOL(can_ioctl); > > static void can_sock_destruct(struct sock *sk) > { > + synchronize_rcu(); > skb_queue_purge(&sk->sk_receive_queue); > }
Adding a synchronize_rcu() at socket close time might have side effects, if say an application had 1000 such sockets and dies. This might add 20 seconds of exit time and have serious implications. I will submit the second patch : It is working for all linux versions. > > And once this arrived in the mainline tree your suggested patch could be > applied? > > In any case we should not forget to give Reported-by credits to Liu. Sure