On Wed, 22 May 2019 14:57:33 -0700, John Fastabend wrote: > Jakub Kicinski wrote: > > On Thu, 09 May 2019 21:57:49 -0700, John Fastabend wrote: > > [...] > > > > > Looks like David Beckett managed to trigger another nasty on the > > release path :/ > > > > BUG: kernel NULL pointer dereference, address: 0000000000000012 > > PGD 0 P4D 0 > > Oops: 0000 [#1] SMP PTI > > CPU: 7 PID: 0 Comm: swapper/7 Not tainted > > 5.2.0-rc1-00139-g14629453a6d3 #21 RIP: 0010:tcp_peek_len+0x10/0x60 > > RSP: 0018:ffffc02e41c54b98 EFLAGS: 00010246 > > RAX: 0000000000000000 RBX: ffff9cf924c4e030 RCX: 0000000000000051 > > RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff9cf97128f480 > > RBP: ffff9cf9365e0300 R08: ffff9cf94fe7d2c0 R09: 0000000000000000 > > R10: 000000000000036b R11: ffff9cf939735e00 R12: ffff9cf91ad9ae40 > > R13: ffff9cf924c4e000 R14: ffff9cf9a8fcbaae R15: 0000000000000020 > > FS: 0000000000000000(0000) GS:ffff9cf9af7c0000(0000) > > knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: > > 0000000080050033 CR2: 0000000000000012 CR3: 000000013920a003 CR4: > > 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > > 0000000000000400 Call Trace: > > <IRQ> > > strp_data_ready+0x48/0x90 > > tls_data_ready+0x22/0xd0 [tls] > > tcp_rcv_established+0x569/0x620 > > tcp_v4_do_rcv+0x127/0x1e0 > > tcp_v4_rcv+0xad7/0xbf0 > > ip_protocol_deliver_rcu+0x2c/0x1c0 > > ip_local_deliver_finish+0x41/0x50 > > ip_local_deliver+0x6b/0xe0 > > ? ip_protocol_deliver_rcu+0x1c0/0x1c0 > > ip_rcv+0x52/0xd0 > > ? ip_rcv_finish_core.isra.20+0x380/0x380 > > __netif_receive_skb_one_core+0x7e/0x90 > > netif_receive_skb_internal+0x42/0xf0 > > napi_gro_receive+0xed/0x150 > > nfp_net_poll+0x7a2/0xd30 [nfp] > > ? kmem_cache_free_bulk+0x286/0x310 > > net_rx_action+0x149/0x3b0 > > __do_softirq+0xe3/0x30a > > ? handle_irq_event_percpu+0x6a/0x80 > > irq_exit+0xe8/0xf0 > > do_IRQ+0x85/0xd0 > > common_interrupt+0xf/0xf > > </IRQ> > > RIP: 0010:cpuidle_enter_state+0xbc/0x450 > > > > If I read this right strparser calls sock->ops->peek_len(sock), but the > > sock->sk is already NULL. I'm guess this is because inet_release() > > does: > > > > sock->sk = NULL; > > sk->sk_prot->close(sk, timeout); > > > > And I don't really see a way for ktls to know that sock->sk is about to > > be cleared, and therefore no way to stop strparser. Or for strparser > > to always do the check, given tcp_peek_len() will do another dereference > > of sock->sk :S > > > > That's mostly a guess, it takes me half an hour of ktls connections > > running to repro. > > > > Any advice would be appreciated.. Can we move the sock->sk assignment > > after close?.. > > > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > > index 5183a2daba64..aff93e7cdb31 100644 > > --- a/net/ipv4/af_inet.c > > +++ b/net/ipv4/af_inet.c > > @@ -428,8 +428,8 @@ int inet_release(struct socket *sock) > > if (sock_flag(sk, SOCK_LINGER) && > > !(current->flags & PF_EXITING)) > > timeout = sk->sk_lingertime; > > - sock->sk = NULL; > > sk->sk_prot->close(sk, timeout); > > + sock->sk = NULL; > > } > > return 0; > > } > > > > I don't see IPv6 clearing this pointer, perhaps we don't have to?
Correction here, IPv6 just calls the IPv4 code, that's why IPv6 was also fixed after my change. > > We tested it and it seems to works, but this is pre-git code, so > > it's hard to tell what the reason to clear was :) > > How about making strp_peek_len tolerant of a null sock->sk? > > diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c > index e137698e8aef..79518f93d2d8 100644 > --- a/net/strparser/strparser.c > +++ b/net/strparser/strparser.c > @@ -84,9 +84,10 @@ static void strp_parser_err(struct strparser *strp, int > err, > static inline int strp_peek_len(struct strparser *strp) > { > if (strp->sk) { > - struct socket *sock = strp->sk->sk_socket; > + struct socket *sock = READ_ONCE(strp->sk->sk_socket); > > - return sock->ops->peek_len(sock); > + if (likely(sock)) > + return sock->ops->peek_len(sock); > } Mmm.. I'm not sure - sk->sk_socket doesn't get cleared AFAICT, the NULL deref is on sk_state of sock->sk so sock is non-NULL here, then: int tcp_peek_len(struct socket *sock) { return tcp_inq(sock->sk); } EXPORT_SYMBOL(tcp_peek_len); Will pass NULL to tcp_inq, which then does: static inline int tcp_inq(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); int answ; if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) { answ = 0; And sk->sk_state is what crashes the machine.